2025-05-11 10:33:04,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2
2025-05-11 10:33:04,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2
2025-05-11 10:33:04,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7426241ce3d0>}
2025-05-11 10:33:04,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1111 [DEBUG]: using device: cpu
2025-05-11 10:33:04,562 baseline-sac-noisy-halfcheetah:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 24
2025-05-11 10:33:04,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-11 10:33:04,583 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 10:33:04,583 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 10:33:04,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-11 10:33:04,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-11 10:35:58,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:36:11,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -326.90552 ± 61.499
2025-05-11 10:36:11,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-311.32712, -342.89545, -459.6673, -292.2297, -245.9298, -288.37195, -385.67636, -260.2585, -308.64267, -374.0567]
2025-05-11 10:36:11,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:36:11,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-326.91) for latency ExtremeClogL1U23
2025-05-11 10:36:11,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:36:11,972 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:36:11,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 8 minutes, 49 seconds)
2025-05-11 10:39:32,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:39:46,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -213.40181 ± 77.462
2025-05-11 10:39:46,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-341.67056, -239.11804, -141.58676, -233.47473, -89.3013, -116.37413, -187.23877, -266.00726, -210.68842, -308.5581]
2025-05-11 10:39:46,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:39:46,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-213.40) for latency ExtremeClogL1U23
2025-05-11 10:39:46,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:39:46,176 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:39:46,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 27 minutes, 46 seconds)
2025-05-11 10:42:40,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:42:53,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -59.29734 ± 87.382
2025-05-11 10:42:53,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-45.313263, -0.5495181, 58.30902, -108.77433, -6.422038, -172.29703, -29.33482, -142.08185, 54.660553, -201.17012]
2025-05-11 10:42:53,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:42:53,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-59.30) for latency ExtremeClogL1U23
2025-05-11 10:42:53,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:42:53,119 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:42:53,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 17 minutes, 2 seconds)
2025-05-11 10:45:45,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:45:58,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -74.34377 ± 68.597
2025-05-11 10:45:58,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-84.64177, 30.366655, -19.318926, -102.05346, -31.767061, -97.65688, -54.264336, -69.203926, -69.15842, -245.73953]
2025-05-11 10:45:58,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:45:58,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 9 minutes, 19 seconds)
2025-05-11 10:48:50,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:49:02,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 8.70368 ± 109.368
2025-05-11 10:49:02,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-54.67316, 177.56094, -133.28296, 30.48662, 3.855367, 3.6633325, 71.91098, 61.399944, -203.46863, 129.58437]
2025-05-11 10:49:02,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:49:02,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (8.70) for latency ExtremeClogL1U23
2025-05-11 10:49:02,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:49:02,809 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:49:02,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 3 minutes, 22 seconds)
2025-05-11 10:51:53,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:52:05,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 148.64429 ± 113.440
2025-05-11 10:52:05,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [90.39651, 222.28206, 254.32155, 73.44824, 102.95945, -114.16804, 136.24168, 295.2161, 181.22693, 244.51846]
2025-05-11 10:52:05,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:52:05,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (148.64) for latency ExtremeClogL1U23
2025-05-11 10:52:05,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:52:05,866 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:52:05,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 58 minutes, 53 seconds)
2025-05-11 10:54:55,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:55:08,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 63.94239 ± 214.450
2025-05-11 10:55:08,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-211.17055, 161.06003, 270.16644, 185.59644, 110.87696, -172.57483, 305.134, -354.73276, 138.99226, 206.07602]
2025-05-11 10:55:08,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:55:08,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 45 minutes, 55 seconds)
2025-05-11 10:57:56,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:58:08,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 147.67506 ± 301.026
2025-05-11 10:58:08,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [9.956732, 579.6513, -112.76982, 524.9384, -175.10071, -142.70811, -252.76495, 231.89803, 397.66452, 415.9852]
2025-05-11 10:58:08,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:58:08,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 40 minutes, 49 seconds)
2025-05-11 11:00:57,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:01:10,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 376.25412 ± 297.050
2025-05-11 11:01:10,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-24.44823, 258.65665, 901.07623, 253.91963, 553.86475, 36.55049, 727.18274, 45.85319, 453.89853, 555.9876]
2025-05-11 11:01:10,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:01:10,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (376.25) for latency ExtremeClogL1U23
2025-05-11 11:01:10,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:01:10,289 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:01:10,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 36 minutes, 41 seconds)
2025-05-11 11:03:57,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:04:10,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1026.55786 ± 493.237
2025-05-11 11:04:10,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [87.63206, 1387.5238, 1186.225, 1496.5236, 1319.2324, 1007.5816, 219.32666, 1540.4758, 716.99994, 1304.0571]
2025-05-11 11:04:10,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:04:10,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (1026.56) for latency ExtremeClogL1U23
2025-05-11 11:04:10,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:04:10,367 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:04:10,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 32 minutes, 16 seconds)
2025-05-11 11:06:57,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:07:10,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1746.24976 ± 260.648
2025-05-11 11:07:10,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1758.5378, 2058.942, 2052.1191, 1555.4563, 1846.4761, 1840.474, 1199.217, 1772.567, 1940.5048, 1438.2047]
2025-05-11 11:07:10,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:07:10,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (1746.25) for latency ExtremeClogL1U23
2025-05-11 11:07:10,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:07:10,418 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:07:10,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 28 minutes, 21 seconds)
2025-05-11 11:09:57,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:10:09,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2093.40552 ± 231.816
2025-05-11 11:10:09,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1997.5117, 1973.3108, 2436.6572, 1639.5532, 2104.0215, 2303.8481, 2289.1736, 1842.1782, 2298.73, 2049.0708]
2025-05-11 11:10:09,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:10:09,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2093.41) for latency ExtremeClogL1U23
2025-05-11 11:10:09,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:10:09,906 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:10:09,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 24 minutes, 24 seconds)
2025-05-11 11:12:57,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:13:10,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2272.91650 ± 134.043
2025-05-11 11:13:10,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2127.458, 2493.5166, 2139.5244, 2359.393, 2378.8943, 2268.1165, 2206.961, 2118.542, 2175.1836, 2461.5754]
2025-05-11 11:13:10,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:13:10,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2272.92) for latency ExtremeClogL1U23
2025-05-11 11:13:10,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:13:10,994 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:13:11,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 21 minutes, 37 seconds)
2025-05-11 11:16:01,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:16:13,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2134.63794 ± 230.058
2025-05-11 11:16:13,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2059.918, 2361.6074, 2049.064, 2179.7373, 1876.6235, 2166.8203, 1809.804, 2203.903, 1990.3788, 2648.523]
2025-05-11 11:16:13,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:16:13,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 19 minutes, 1 second)
2025-05-11 11:19:01,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:19:14,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2564.45386 ± 279.893
2025-05-11 11:19:14,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2384.007, 2752.5117, 2764.464, 2711.25, 2665.6902, 2622.5295, 2635.3953, 2483.6477, 1811.625, 2813.4187]
2025-05-11 11:19:14,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:19:14,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2564.45) for latency ExtremeClogL1U23
2025-05-11 11:19:14,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:19:14,912 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:19:14,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 16 minutes, 17 seconds)
2025-05-11 11:22:02,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:22:14,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2334.26831 ± 266.346
2025-05-11 11:22:14,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2494.019, 2761.551, 2639.3154, 2569.7356, 2049.176, 2172.7812, 2385.2896, 2163.5137, 1886.0132, 2221.2896]
2025-05-11 11:22:14,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:22:14,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 13 minutes, 7 seconds)
2025-05-11 11:24:57,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:25:09,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2219.53662 ± 683.021
2025-05-11 11:25:09,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2658.2715, 1765.4728, 2462.4841, 379.65112, 2486.4746, 2074.0898, 2404.4658, 2868.9116, 2718.8557, 2376.6924]
2025-05-11 11:25:09,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:25:09,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 8 minutes, 46 seconds)
2025-05-11 11:27:52,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:28:04,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2668.35132 ± 237.835
2025-05-11 11:28:04,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2960.8015, 2213.9958, 2894.4016, 2486.2644, 2584.3545, 2435.0552, 2994.7993, 2661.0203, 2641.0098, 2811.8132]
2025-05-11 11:28:04,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:28:04,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2668.35) for latency ExtremeClogL1U23
2025-05-11 11:28:04,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:28:04,242 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:28:04,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 4 minutes, 9 seconds)
2025-05-11 11:30:47,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:30:59,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2445.73096 ± 304.636
2025-05-11 11:30:59,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2760.8962, 2672.5522, 2057.0862, 2429.4568, 2012.7319, 2119.6187, 2759.248, 2216.695, 2850.2834, 2578.7412]
2025-05-11 11:30:59,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:30:59,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 59 minutes, 7 seconds)
2025-05-11 11:33:41,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:33:53,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2632.78125 ± 239.837
2025-05-11 11:33:53,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2924.16, 2740.5698, 2382.9558, 2646.9448, 2700.953, 2648.6392, 2260.7053, 2948.5786, 2815.2964, 2259.009]
2025-05-11 11:33:53,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:33:53,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 54 minutes, 18 seconds)
2025-05-11 11:36:43,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:36:54,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2581.14990 ± 252.811
2025-05-11 11:36:54,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2204.3809, 2524.223, 2219.4915, 2931.4358, 2487.776, 2676.9795, 2687.5383, 2732.3533, 2381.0522, 2966.2693]
2025-05-11 11:36:54,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:36:54,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 51 minutes, 52 seconds)
2025-05-11 11:39:39,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:39:50,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2545.09424 ± 523.968
2025-05-11 11:39:50,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2462.886, 2811.3535, 2838.2832, 2928.1257, 2999.1084, 2789.7622, 2366.5483, 1099.9806, 2424.506, 2730.3914]
2025-05-11 11:39:50,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:39:50,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 49 minutes, 7 seconds)
2025-05-11 11:42:32,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:42:43,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2512.59863 ± 218.037
2025-05-11 11:42:43,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2124.0715, 2375.3699, 2347.3027, 2453.889, 2769.2546, 2430.8975, 2398.0454, 2865.9001, 2617.8225, 2743.433]
2025-05-11 11:42:43,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:42:43,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 45 minutes, 46 seconds)
2025-05-11 11:45:26,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:45:37,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2703.88184 ± 240.304
2025-05-11 11:45:37,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2670.307, 2520.2307, 2114.8306, 2852.0422, 2826.4128, 2811.809, 2899.6047, 2649.0564, 2667.568, 3026.9565]
2025-05-11 11:45:37,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:45:37,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2703.88) for latency ExtremeClogL1U23
2025-05-11 11:45:37,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:45:37,541 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:45:37,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 42 minutes, 25 seconds)
2025-05-11 11:48:19,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:48:31,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2833.23853 ± 174.767
2025-05-11 11:48:31,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2840.7673, 2995.784, 2828.3872, 2992.146, 2531.3884, 3151.7583, 2645.3481, 2681.6082, 2870.1401, 2795.0562]
2025-05-11 11:48:31,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:48:31,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2833.24) for latency ExtremeClogL1U23
2025-05-11 11:48:31,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:48:31,265 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:48:31,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 39 minutes, 25 seconds)
2025-05-11 11:51:13,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:51:25,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2742.33740 ± 406.720
2025-05-11 11:51:25,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2150.5696, 3113.7075, 2365.7485, 2936.517, 2521.824, 2700.0378, 2927.3035, 3360.398, 2168.4055, 3178.8618]
2025-05-11 11:51:25,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:51:25,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 34 minutes, 48 seconds)
2025-05-11 11:54:11,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:54:24,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2786.41602 ± 417.366
2025-05-11 11:54:24,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2692.157, 2791.46, 1886.3677, 2995.3792, 3236.164, 3239.1628, 2562.5042, 3018.318, 2297.2703, 3145.3772]
2025-05-11 11:54:24,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:54:24,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 32 minutes, 36 seconds)
2025-05-11 11:57:11,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:57:23,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2782.35913 ± 286.105
2025-05-11 11:57:23,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2621.9102, 2931.8857, 2502.111, 2232.4463, 2624.6003, 2837.561, 2735.0056, 3036.148, 3258.6433, 3043.2817]
2025-05-11 11:57:23,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:57:23,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 31 minutes, 1 second)
2025-05-11 12:00:09,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:00:22,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2735.15332 ± 360.741
2025-05-11 12:00:22,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2662.5447, 2317.6729, 2948.0645, 2533.8376, 3135.6887, 2255.3748, 2577.1553, 3301.3591, 3189.7664, 2430.068]
2025-05-11 12:00:22,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:00:22,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 29 minutes, 19 seconds)
2025-05-11 12:03:08,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:03:20,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2813.75830 ± 462.178
2025-05-11 12:03:20,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2952.3752, 3016.9573, 3207.1785, 2855.2056, 3365.1772, 3024.475, 2137.9834, 3088.2883, 2687.43, 1802.5122]
2025-05-11 12:03:20,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:03:20,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 27 minutes, 34 seconds)
2025-05-11 12:06:07,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:06:19,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2966.91748 ± 238.140
2025-05-11 12:06:19,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3155.6143, 2974.9485, 3026.517, 3328.3123, 2567.362, 2642.5808, 3087.1086, 2726.0154, 2949.5554, 3211.1643]
2025-05-11 12:06:19,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:06:19,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2966.92) for latency ExtremeClogL1U23
2025-05-11 12:06:19,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:06:19,675 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 12:06:19,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 25 minutes, 35 seconds)
2025-05-11 12:09:07,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:09:19,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2555.96973 ± 1011.873
2025-05-11 12:09:19,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3241.609, 2816.274, 3176.7612, -330.26514, 3171.0698, 2181.8477, 2493.556, 2979.597, 3029.388, 2799.8586]
2025-05-11 12:09:19,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:09:19,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 22 minutes, 55 seconds)
2025-05-11 12:12:06,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:12:19,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2871.32300 ± 373.495
2025-05-11 12:12:19,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2948.6067, 3241.072, 2688.552, 3038.3164, 2745.9001, 2694.9937, 3237.7573, 3088.6072, 3109.8406, 1919.5862]
2025-05-11 12:12:19,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:12:19,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 20 minutes, 5 seconds)
2025-05-11 12:15:07,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:15:19,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2996.74438 ± 284.020
2025-05-11 12:15:19,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2524.1653, 3172.9592, 3138.005, 2892.1167, 2452.092, 3078.236, 3222.7964, 2999.2913, 3393.6162, 3094.1653]
2025-05-11 12:15:19,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:15:19,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2996.74) for latency ExtremeClogL1U23
2025-05-11 12:15:19,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:15:19,633 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 12:15:19,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 17 minutes, 28 seconds)
2025-05-11 12:18:07,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:18:20,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2954.41577 ± 191.405
2025-05-11 12:18:20,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3031.2773, 3082.2688, 2881.4019, 2812.7107, 3128.4946, 3038.5984, 2519.6025, 3244.1458, 2941.4038, 2864.2532]
2025-05-11 12:18:20,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:18:20,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 14 minutes, 49 seconds)
2025-05-11 12:21:07,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:21:19,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2977.91113 ± 344.883
2025-05-11 12:21:19,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2853.2349, 2937.8472, 2897.2139, 2863.9668, 2551.295, 3605.7375, 2593.4539, 3181.9443, 3553.333, 2741.0845]
2025-05-11 12:21:19,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:21:19,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 12 minutes, 2 seconds)
2025-05-11 12:24:06,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:24:19,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3147.65869 ± 463.899
2025-05-11 12:24:19,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3383.6042, 3688.8403, 2663.2173, 3192.4214, 3465.4573, 3347.502, 3568.834, 3220.0537, 2064.8076, 2881.8484]
2025-05-11 12:24:19,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:24:19,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3147.66) for latency ExtremeClogL1U23
2025-05-11 12:24:19,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:24:19,464 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 12:24:19,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 9 minutes, 1 second)
2025-05-11 12:27:06,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:27:19,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3014.75635 ± 185.139
2025-05-11 12:27:19,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3149.8394, 3235.087, 3192.0818, 3021.6353, 3056.445, 3114.4202, 3115.2866, 2655.3708, 2850.9016, 2756.497]
2025-05-11 12:27:19,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:27:19,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 6 minutes, 3 seconds)
2025-05-11 12:30:07,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:30:20,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3000.89453 ± 303.463
2025-05-11 12:30:20,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2617.4592, 3670.6675, 2860.405, 3070.2622, 3227.4324, 2811.8447, 2816.922, 3269.554, 2991.462, 2672.9373]
2025-05-11 12:30:20,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:30:20,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 3 minutes, 5 seconds)
2025-05-11 12:33:06,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:33:19,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2932.23047 ± 229.057
2025-05-11 12:33:19,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2770.7524, 2992.3938, 3045.9265, 3049.4363, 3019.8232, 2985.1514, 2736.0188, 2384.6082, 3108.9302, 3229.264]
2025-05-11 12:33:19,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:33:19,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 59 minutes, 53 seconds)
2025-05-11 12:36:06,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:36:19,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3042.41162 ± 271.899
2025-05-11 12:36:19,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2399.1545, 3421.5269, 2951.1936, 2987.492, 3232.5305, 3021.314, 2842.7688, 3231.617, 3283.4194, 3053.0989]
2025-05-11 12:36:19,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:36:19,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 56 minutes, 51 seconds)
2025-05-11 12:39:06,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:39:19,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3064.68213 ± 313.024
2025-05-11 12:39:19,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2502.425, 3083.9744, 3114.9897, 2724.1646, 3591.79, 2917.8806, 3395.4465, 2994.9753, 3398.0752, 2923.1003]
2025-05-11 12:39:19,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:39:19,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 53 minutes, 56 seconds)
2025-05-11 12:42:07,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:42:19,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3075.45801 ± 230.066
2025-05-11 12:42:19,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3096.4033, 3488.6233, 2913.7405, 3311.1887, 3169.9646, 3186.6858, 3091.289, 2911.59, 2976.8745, 2608.2207]
2025-05-11 12:42:19,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:42:19,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 51 minutes, 5 seconds)
2025-05-11 12:45:07,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:45:19,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3127.69678 ± 262.322
2025-05-11 12:45:19,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3029.0513, 3117.5603, 3264.0156, 3415.2908, 3385.6953, 3338.016, 3177.1155, 2458.8384, 3084.0378, 3007.3455]
2025-05-11 12:45:19,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:45:19,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 47 minutes, 55 seconds)
2025-05-11 12:48:06,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:48:19,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3028.76221 ± 286.965
2025-05-11 12:48:19,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3313.563, 2822.7107, 2588.6304, 2953.0908, 3254.0498, 2930.278, 3609.1963, 3089.9277, 3012.22, 2713.9543]
2025-05-11 12:48:19,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:48:19,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 44 minutes, 54 seconds)
2025-05-11 12:51:05,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:51:18,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3047.79150 ± 376.253
2025-05-11 12:51:18,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3363.302, 2877.0247, 3064.631, 2503.622, 3232.8777, 2546.0745, 3141.6438, 3724.7542, 3354.723, 2669.2627]
2025-05-11 12:51:18,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:51:18,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 41 minutes, 48 seconds)
2025-05-11 12:54:05,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:54:17,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3024.11255 ± 288.410
2025-05-11 12:54:17,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2917.3425, 3108.4783, 3362.5303, 3325.7334, 3123.6404, 2293.5999, 2950.8818, 3103.2134, 3189.9949, 2865.7102]
2025-05-11 12:54:17,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:54:17,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 38 minutes, 44 seconds)
2025-05-11 12:57:04,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:57:16,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3106.12939 ± 304.144
2025-05-11 12:57:16,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3350.1616, 3476.967, 3222.2898, 2608.743, 2710.1514, 2905.1511, 2848.547, 3532.7502, 3166.792, 3239.7393]
2025-05-11 12:57:16,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:57:16,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 35 minutes, 29 seconds)
2025-05-11 13:00:03,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:00:16,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3104.36865 ± 428.858
2025-05-11 13:00:16,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3118.3193, 3609.4795, 3671.8345, 3194.151, 3376.45, 2174.3943, 2710.3428, 3379.9065, 2947.7173, 2861.0894]
2025-05-11 13:00:16,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:00:16,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 32 minutes, 26 seconds)
2025-05-11 13:03:03,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:03:16,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3045.96777 ± 446.820
2025-05-11 13:03:16,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2576.2913, 3084.6772, 3404.5032, 3118.1382, 3665.2842, 3376.3826, 3344.4233, 2083.2961, 3117.1272, 2689.5542]
2025-05-11 13:03:16,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:03:16,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 29 minutes, 29 seconds)
2025-05-11 13:06:03,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:06:16,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3134.80273 ± 329.526
2025-05-11 13:06:16,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3102.6091, 3582.072, 3018.5098, 3148.0007, 2967.722, 2711.764, 3446.5938, 2951.057, 2698.981, 3720.718]
2025-05-11 13:06:16,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:06:16,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 26 minutes, 39 seconds)
2025-05-11 13:09:01,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:09:13,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3030.98682 ± 263.889
2025-05-11 13:09:13,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2834.455, 2886.6267, 2850.692, 2942.6016, 2837.469, 3184.0056, 3482.5684, 2865.044, 2868.3418, 3558.066]
2025-05-11 13:09:13,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:09:13,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 23 minutes, 23 seconds)
2025-05-11 13:11:56,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:12:08,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3156.34814 ± 235.312
2025-05-11 13:12:08,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3255.7346, 3112.5166, 3161.5088, 3286.811, 3145.428, 3616.0637, 2962.713, 2845.2314, 3380.271, 2797.2021]
2025-05-11 13:12:08,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:12:08,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3156.35) for latency ExtremeClogL1U23
2025-05-11 13:12:08,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:12:08,713 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:12:08,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 19 minutes, 42 seconds)
2025-05-11 13:14:51,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:15:04,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3023.59521 ± 243.148
2025-05-11 13:15:04,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2608.7068, 2900.1436, 2725.7673, 3223.289, 3468.8079, 3208.3623, 2935.831, 3201.5142, 2986.6626, 2976.868]
2025-05-11 13:15:04,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:15:04,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 16 minutes, 6 seconds)
2025-05-11 13:17:47,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:17:59,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3224.48096 ± 157.426
2025-05-11 13:17:59,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3213.8787, 3176.722, 3070.0286, 3342.1162, 3353.993, 3261.7134, 3175.9197, 3322.6729, 3455.6367, 2872.1306]
2025-05-11 13:17:59,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:17:59,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3224.48) for latency ExtremeClogL1U23
2025-05-11 13:17:59,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:17:59,892 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:17:59,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 12 minutes, 34 seconds)
2025-05-11 13:20:43,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:20:55,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2999.48975 ± 414.815
2025-05-11 13:20:55,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2976.178, 3376.5256, 2991.5967, 2780.9373, 2091.3933, 3554.134, 3592.187, 2875.1619, 2848.021, 2908.7646]
2025-05-11 13:20:55,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:20:55,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 9 minutes, 1 second)
2025-05-11 13:23:39,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:23:51,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2974.83252 ± 252.523
2025-05-11 13:23:51,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3154.399, 2382.618, 3185.5212, 3171.2234, 3279.8806, 3052.9197, 2802.886, 3036.705, 2882.5457, 2799.629]
2025-05-11 13:23:51,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:23:51,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 5 minutes, 46 seconds)
2025-05-11 13:26:34,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:26:46,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3059.69727 ± 344.605
2025-05-11 13:26:46,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3298.2227, 3598.7903, 2556.2102, 2675.0576, 3228.3242, 2714.0696, 3491.377, 2757.8787, 3158.5527, 3118.4902]
2025-05-11 13:26:46,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:26:46,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 2 minutes, 54 seconds)
2025-05-11 13:29:30,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:29:42,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3214.84033 ± 447.839
2025-05-11 13:29:42,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2541.3992, 3344.2915, 3177.9436, 3455.8635, 3345.152, 3358.145, 3732.942, 3888.9978, 2853.6782, 2449.991]
2025-05-11 13:29:42,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:29:42,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 6 seconds)
2025-05-11 13:32:26,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:32:38,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3040.54126 ± 684.781
2025-05-11 13:32:38,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3292.4531, 3278.8787, 3584.0994, 2702.0078, 3745.2456, 3121.67, 1218.0756, 3596.581, 2900.5193, 2965.8801]
2025-05-11 13:32:38,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:32:38,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 57 minutes, 11 seconds)
2025-05-11 13:35:22,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:35:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3281.73975 ± 268.530
2025-05-11 13:35:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3264.8203, 2831.1067, 3194.5386, 3424.9873, 3212.758, 3893.1992, 3260.251, 3368.114, 2979.0427, 3388.5798]
2025-05-11 13:35:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:35:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3281.74) for latency ExtremeClogL1U23
2025-05-11 13:35:34,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:35:34,532 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:35:34,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 54 minutes, 14 seconds)
2025-05-11 13:38:17,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:38:29,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3106.43237 ± 206.615
2025-05-11 13:38:29,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3376.7583, 3497.5352, 2869.993, 2974.1355, 3105.9358, 3310.8325, 2957.9983, 3052.9316, 3052.282, 2865.9224]
2025-05-11 13:38:29,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:38:29,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 51 minutes, 12 seconds)
2025-05-11 13:41:14,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:41:27,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3167.80957 ± 486.709
2025-05-11 13:41:27,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3153.9658, 3295.7393, 3742.4778, 3063.1738, 2253.1133, 3598.4355, 3788.483, 2545.7268, 2792.9785, 3444.0027]
2025-05-11 13:41:27,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:41:27,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 48 minutes, 36 seconds)
2025-05-11 13:44:13,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:44:26,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3282.26685 ± 286.965
2025-05-11 13:44:26,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3718.4404, 3390.165, 3175.509, 3010.8708, 2787.904, 3167.5916, 3346.7063, 3605.9478, 3019.45, 3600.0828]
2025-05-11 13:44:26,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:44:26,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3282.27) for latency ExtremeClogL1U23
2025-05-11 13:44:26,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:44:26,463 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:44:26,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 46 minutes, 2 seconds)
2025-05-11 13:47:14,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:47:26,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3105.26709 ± 396.373
2025-05-11 13:47:26,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3021.432, 2194.5752, 2922.5571, 2983.4316, 3087.54, 3289.0513, 3645.5015, 3339.0254, 2937.3132, 3632.2444]
2025-05-11 13:47:26,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:47:26,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 43 minutes, 35 seconds)
2025-05-11 13:50:12,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:50:24,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3212.30591 ± 335.686
2025-05-11 13:50:24,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3648.207, 3183.1304, 3502.7964, 3469.7754, 2476.9773, 3175.6433, 2947.8416, 3496.7476, 3300.5425, 2921.3994]
2025-05-11 13:50:24,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:50:24,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 40 minutes, 51 seconds)
2025-05-11 13:53:27,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:53:42,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2916.08228 ± 598.729
2025-05-11 13:53:42,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3157.5461, 1529.8376, 2995.2212, 2774.388, 3340.457, 2557.3962, 3603.1643, 2710.3442, 2727.9236, 3764.5432]
2025-05-11 13:53:42,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:53:42,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 40 minutes, 28 seconds)
2025-05-11 13:57:10,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:57:22,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3020.43604 ± 324.851
2025-05-11 13:57:22,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3190.3267, 2674.9072, 3342.0671, 3228.0483, 3031.9668, 2559.0388, 2414.505, 3221.3, 3186.5586, 3355.6426]
2025-05-11 13:57:22,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:57:22,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 41 minutes, 56 seconds)
2025-05-11 14:00:27,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:00:42,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3040.28784 ± 382.611
2025-05-11 14:00:42,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3021.8774, 3764.6948, 3309.5684, 2917.4402, 2787.761, 3076.7195, 2623.2588, 2707.798, 2599.2664, 3594.496]
2025-05-11 14:00:42,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:00:42,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 40 minutes, 49 seconds)
2025-05-11 14:03:54,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:04:10,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3144.95752 ± 274.657
2025-05-11 14:04:10,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2902.8142, 3471.0693, 2720.587, 3365.1973, 3137.4885, 2876.8972, 3366.316, 2816.658, 3450.0076, 3342.5396]
2025-05-11 14:04:10,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:04:10,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 40 minutes, 22 seconds)
2025-05-11 14:07:39,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:07:55,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3065.09424 ± 423.131
2025-05-11 14:07:55,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2675.529, 3775.4414, 2675.916, 2893.8445, 3108.7925, 3788.1006, 3391.9084, 2974.6143, 2788.8203, 2577.9749]
2025-05-11 14:07:55,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:07:55,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 41 minutes, 33 seconds)
2025-05-11 14:11:24,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:11:40,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3265.30615 ± 286.149
2025-05-11 14:11:40,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3131.4778, 3105.2683, 3237.607, 3473.3923, 3818.6672, 3324.935, 3352.117, 2648.8623, 3409.7192, 3151.0134]
2025-05-11 14:11:40,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:11:40,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 40 minutes, 34 seconds)
2025-05-11 14:14:45,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:14:57,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3176.49023 ± 203.821
2025-05-11 14:14:57,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2840.8623, 3053.5847, 3145.3152, 3157.4824, 3297.161, 3129.5078, 3520.0212, 2945.2068, 3186.265, 3489.4968]
2025-05-11 14:14:57,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:14:57,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 34 minutes, 55 seconds)
2025-05-11 14:17:45,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:17:58,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3298.90186 ± 295.500
2025-05-11 14:17:58,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3318.2427, 3366.5066, 3591.2522, 3413.0796, 2521.2903, 3551.1821, 3378.558, 3388.3237, 3427.358, 3033.2249]
2025-05-11 14:17:58,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:17:58,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3298.90) for latency ExtremeClogL1U23
2025-05-11 14:17:58,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:17:58,473 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 14:17:58,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 29 minutes, 48 seconds)
2025-05-11 14:20:46,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:20:59,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3138.09888 ± 305.757
2025-05-11 14:20:59,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3333.2124, 3338.9246, 3494.6997, 3595.1736, 2898.446, 3355.0115, 2683.5273, 2986.2593, 2942.0642, 2753.6694]
2025-05-11 14:20:59,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:20:59,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 24 minutes, 5 seconds)
2025-05-11 14:23:46,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:23:58,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3065.95166 ± 535.029
2025-05-11 14:23:58,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3792.9946, 2925.1533, 1806.8951, 3562.6848, 3408.269, 3200.689, 2855.439, 3427.5322, 2653.362, 3026.496]
2025-05-11 14:23:58,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:23:58,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 17 minutes, 5 seconds)
2025-05-11 14:26:46,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:26:58,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3020.71631 ± 224.188
2025-05-11 14:26:58,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2888.9492, 2989.6292, 3249.841, 3414.4026, 2882.583, 3248.3745, 3107.7144, 2901.575, 2920.6294, 2603.4678]
2025-05-11 14:26:58,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:26:58,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 10 minutes, 25 seconds)
2025-05-11 14:29:46,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:29:58,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3332.30396 ± 421.051
2025-05-11 14:29:58,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2404.954, 3227.0728, 3216.4468, 3110.93, 3719.003, 3914.2893, 2954.478, 3594.4583, 3535.496, 3645.9087]
2025-05-11 14:29:58,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:29:58,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3332.30) for latency ExtremeClogL1U23
2025-05-11 14:29:58,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:29:58,528 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 14:29:58,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 6 minutes, 4 seconds)
2025-05-11 14:32:46,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:32:58,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2778.68774 ± 377.493
2025-05-11 14:32:58,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2556.8794, 2848.8235, 3040.112, 3241.8706, 3114.6318, 2155.5774, 2263.3682, 3291.21, 2733.6707, 2540.7336]
2025-05-11 14:32:58,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:32:58,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 3 minutes)
2025-05-11 14:35:47,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:35:58,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3317.62378 ± 397.529
2025-05-11 14:35:58,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2645.615, 3528.0718, 3480.8513, 3547.526, 3398.4553, 3805.1897, 2583.706, 3486.5864, 3659.442, 3040.796]
2025-05-11 14:35:58,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:35:58,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 59 minutes, 57 seconds)
2025-05-11 14:38:47,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:38:59,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3229.55396 ± 352.651
2025-05-11 14:38:59,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3384.3037, 3308.5305, 3189.345, 3347.4226, 3836.0745, 2690.5955, 3523.142, 2594.3855, 3367.8027, 3053.937]
2025-05-11 14:38:59,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:38:59,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 57 minutes, 1 second)
2025-05-11 14:41:49,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:42:00,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3205.06250 ± 440.501
2025-05-11 14:42:00,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3805.1562, 2326.1636, 3589.8267, 3249.3809, 3065.9666, 3165.3772, 2766.9434, 2970.2864, 3823.957, 3287.5654]
2025-05-11 14:42:00,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:42:00,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 54 minutes, 5 seconds)
2025-05-11 14:44:49,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:45:01,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3251.98511 ± 344.765
2025-05-11 14:45:01,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3547.9873, 2666.364, 3378.2961, 3740.848, 3729.8628, 2847.8599, 3257.7173, 3162.01, 3258.2712, 2930.6348]
2025-05-11 14:45:01,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:45:01,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 51 minutes, 9 seconds)
2025-05-11 14:47:49,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:48:01,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3265.34912 ± 354.903
2025-05-11 14:48:01,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3800.559, 3858.22, 3192.304, 3129.4438, 3250.3271, 3188.2295, 3325.3318, 3054.1655, 3324.3792, 2530.5317]
2025-05-11 14:48:01,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:48:01,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 8 seconds)
2025-05-11 14:50:49,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:51:01,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3286.00269 ± 428.524
2025-05-11 14:51:01,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2945.0762, 3590.9263, 2924.8794, 3916.8608, 3352.6848, 2915.1282, 2570.5522, 3468.1814, 3926.564, 3249.1724]
2025-05-11 14:51:01,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:51:01,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 45 minutes, 7 seconds)
2025-05-11 14:53:49,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:54:01,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3227.62476 ± 419.372
2025-05-11 14:54:01,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3057.7373, 3686.907, 3719.142, 3673.9937, 3000.1545, 3386.8062, 2488.6826, 2586.877, 3237.8196, 3438.1267]
2025-05-11 14:54:01,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:54:01,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 42 minutes, 6 seconds)
2025-05-11 14:56:49,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:57:01,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3226.03540 ± 232.581
2025-05-11 14:57:01,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3342.5483, 3664.539, 3064.0796, 2902.883, 3322.0283, 3349.7925, 3484.6196, 3076.234, 3101.6594, 2951.968]
2025-05-11 14:57:01,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:57:01,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 39 minutes, 3 seconds)
2025-05-11 14:59:49,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:00:02,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3249.89111 ± 289.520
2025-05-11 15:00:02,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2706.283, 3115.3381, 3053.506, 3816.6724, 3446.4956, 3026.5742, 3197.2332, 3315.6082, 3316.9966, 3504.205]
2025-05-11 15:00:02,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:00:02,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 36 minutes, 1 second)
2025-05-11 15:02:50,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:03:02,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3288.63745 ± 410.517
2025-05-11 15:03:02,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4024.8882, 3121.101, 3337.7117, 2642.8943, 3219.2312, 2909.719, 3023.7734, 3875.6602, 3611.353, 3120.0398]
2025-05-11 15:03:02,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:03:02,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 33 minutes, 3 seconds)
2025-05-11 15:05:51,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:06:04,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3268.36353 ± 457.356
2025-05-11 15:06:04,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4036.577, 2958.8044, 3352.7976, 2353.9207, 3686.317, 3535.3892, 2808.542, 3521.5125, 3082.2195, 3347.5574]
2025-05-11 15:06:04,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:06:04,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 5 seconds)
2025-05-11 15:08:50,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:09:03,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3240.20752 ± 364.312
2025-05-11 15:09:03,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3483.521, 2553.233, 3117.7756, 2856.603, 3395.2788, 3654.6375, 3584.5798, 3697.6025, 3133.0916, 2925.7542]
2025-05-11 15:09:03,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:09:03,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 3 seconds)
2025-05-11 15:11:47,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:12:00,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3088.72827 ± 273.268
2025-05-11 15:12:00,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3137.7642, 3406.523, 2992.04, 2631.2153, 3276.2751, 3405.5974, 2840.405, 3333.0662, 2688.5376, 3175.8591]
2025-05-11 15:12:00,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:12:00,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 57 seconds)
2025-05-11 15:14:43,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:14:56,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3043.85986 ± 914.782
2025-05-11 15:14:56,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3259.5999, 482.71347, 3906.8696, 3390.7349, 3078.4993, 2814.7974, 3256.3044, 3005.9636, 3875.9436, 3367.1738]
2025-05-11 15:14:56,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:14:56,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 51 seconds)
2025-05-11 15:17:39,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:17:52,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3421.47705 ± 247.951
2025-05-11 15:17:52,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3604.3416, 3513.7258, 3116.2153, 3679.209, 3548.691, 3092.748, 3695.2144, 3136.7803, 3156.153, 3671.692]
2025-05-11 15:17:52,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:17:52,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3421.48) for latency ExtremeClogL1U23
2025-05-11 15:17:52,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:17:52,379 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:17:52,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 47 seconds)
2025-05-11 15:20:37,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:20:49,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3282.99463 ± 220.759
2025-05-11 15:20:49,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3451.6558, 3436.3823, 2938.8638, 3590.326, 2889.5588, 3277.8389, 3234.5378, 3204.3477, 3280.2607, 3526.1711]
2025-05-11 15:20:49,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:20:50,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 45 seconds)
2025-05-11 15:23:35,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:23:47,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2952.32739 ± 396.557
2025-05-11 15:23:47,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3482.4705, 2447.718, 2667.8335, 3287.679, 2416.6892, 2911.0127, 3011.91, 2724.4126, 3658.0415, 2915.506]
2025-05-11 15:23:47,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:23:47,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 47 seconds)
2025-05-11 15:26:33,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:26:46,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3138.19116 ± 415.605
2025-05-11 15:26:46,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3752.9727, 3371.6204, 2585.6016, 2623.6047, 2994.0586, 3329.2925, 3552.7312, 3399.8462, 2498.8171, 3273.3699]
2025-05-11 15:26:46,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:26:46,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 51 seconds)
2025-05-11 15:29:32,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:29:44,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3116.70190 ± 354.741
2025-05-11 15:29:44,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3051.7358, 3302.6558, 3851.7751, 3439.0671, 3340.991, 2936.3071, 2845.582, 2753.4546, 3061.6953, 2583.7568]
2025-05-11 15:29:44,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:29:44,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 55 seconds)
2025-05-11 15:32:31,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:32:43,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3237.53369 ± 292.613
2025-05-11 15:32:43,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3081.195, 2770.7905, 3578.8694, 3155.8025, 2888.197, 3627.9585, 3374.048, 3048.131, 3211.458, 3638.888]
2025-05-11 15:32:43,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:32:43,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 58 seconds)
2025-05-11 15:35:30,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:35:42,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3333.35034 ± 234.730
2025-05-11 15:35:42,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3043.2498, 3460.294, 3111.6816, 3436.196, 3252.08, 3395.667, 3694.9631, 3022.824, 3705.3608, 3211.1843]
2025-05-11 15:35:42,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:35:42,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1251 [DEBUG]: Training session finished
