2025-05-07 23:57:22,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4
2025-05-07 23:57:22,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4
2025-05-07 23:57:22,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x719a977c3f10>}
2025-05-07 23:57:22,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1009 [DEBUG]: using device: cpu
2025-05-07 23:57:22,265 baseline-sac-noisy-halfcheetah:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 32
2025-05-07 23:57:22,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1031 [INFO]: Creating new trainer
2025-05-07 23:57:22,282 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-07 23:57:22,282 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=47, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 23:57:22,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1092 [DEBUG]: Starting training session...
2025-05-07 23:57:22,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 1/100
2025-05-08 00:00:02,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:00:15,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -150.92270 ± 39.769
2025-05-08 00:00:15,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-120.089554, -182.38593, -203.09317, -151.69266, -144.62831, -91.2354, -198.10745, -172.4842, -163.20764, -82.30257]
2025-05-08 00:00:15,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:00:15,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-150.92) for latency ExtremeSparseL4U32
2025-05-08 00:00:15,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:00:15,577 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:00:15,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 45 minutes, 36 seconds)
2025-05-08 00:03:04,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:03:16,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -297.46460 ± 48.579
2025-05-08 00:03:16,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-304.08417, -195.62018, -309.51953, -294.34528, -358.48816, -340.24655, -253.41183, -365.73654, -277.2775, -275.9162]
2025-05-08 00:03:16,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:03:16,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 49 minutes, 27 seconds)
2025-05-08 00:06:02,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:06:15,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -147.06683 ± 48.246
2025-05-08 00:06:15,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-121.13024, -123.07662, -154.89442, -137.76445, -113.399506, -85.90771, -116.88774, -222.27177, -249.24167, -146.09407]
2025-05-08 00:06:15,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:06:15,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-147.07) for latency ExtremeSparseL4U32
2025-05-08 00:06:15,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:06:15,226 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:06:15,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 47 minutes, 5 seconds)
2025-05-08 00:09:00,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:09:13,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -133.29800 ± 66.092
2025-05-08 00:09:13,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-166.73291, -142.75455, -63.11397, -186.61223, -149.147, -10.871339, -97.11904, -167.21414, -257.10107, -92.31379]
2025-05-08 00:09:13,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:09:13,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-133.30) for latency ExtremeSparseL4U32
2025-05-08 00:09:13,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:09:13,633 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:09:13,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 44 minutes, 27 seconds)
2025-05-08 00:11:59,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:12:11,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -174.28719 ± 88.465
2025-05-08 00:12:11,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-109.67139, -211.7818, -201.78639, -269.4316, -54.14046, -2.3271182, -158.94537, -288.4067, -202.8982, -243.48277]
2025-05-08 00:12:11,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:12:11,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 41 minutes, 40 seconds)
2025-05-08 00:14:58,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:15:10,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -92.19980 ± 53.936
2025-05-08 00:15:10,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-106.837944, -137.87415, -64.13525, -128.84349, -109.90564, -14.96911, -85.65858, -45.463604, -201.30711, -27.003113]
2025-05-08 00:15:10,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:15:10,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-92.20) for latency ExtremeSparseL4U32
2025-05-08 00:15:10,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:15:10,713 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:15:10,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 40 minutes, 28 seconds)
2025-05-08 00:17:56,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:18:09,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -99.78508 ± 43.983
2025-05-08 00:18:09,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-53.424137, -37.97107, -105.55849, -95.517944, -99.97396, -103.54798, -112.28312, -67.77198, -209.15985, -112.64236]
2025-05-08 00:18:09,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:18:09,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 36 minutes, 40 seconds)
2025-05-08 00:20:55,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:21:08,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -49.83796 ± 70.702
2025-05-08 00:21:08,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [104.609344, -25.790257, -107.194435, -69.93841, -30.383532, -164.05345, -37.929363, -107.949585, -75.27805, 15.528175]
2025-05-08 00:21:08,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:21:08,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-49.84) for latency ExtremeSparseL4U32
2025-05-08 00:21:08,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:21:08,482 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:21:08,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 33 minutes, 55 seconds)
2025-05-08 00:23:54,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:24:07,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -37.20377 ± 30.389
2025-05-08 00:24:07,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [10.451954, -25.56342, -28.773249, -74.98951, -52.995037, 22.147017, -46.423428, -50.81942, -61.209248, -63.86332]
2025-05-08 00:24:07,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:24:07,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-37.20) for latency ExtremeSparseL4U32
2025-05-08 00:24:07,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:24:07,309 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:24:07,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 31 minutes, 4 seconds)
2025-05-08 00:26:53,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:27:06,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -32.23653 ± 102.071
2025-05-08 00:27:06,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-119.705765, 2.7123463, -140.28139, 13.716159, 188.95139, 31.262447, -85.62923, -85.31239, -166.21538, 38.13653]
2025-05-08 00:27:06,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:27:06,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-32.24) for latency ExtremeSparseL4U32
2025-05-08 00:27:06,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:27:06,475 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:27:06,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 28 minutes, 21 seconds)
2025-05-08 00:29:52,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:30:05,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -4.38887 ± 70.726
2025-05-08 00:30:05,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [28.913412, 45.77642, -30.993814, -190.28442, -15.134215, -8.4756565, 38.944363, 88.0087, 16.282593, -16.926033]
2025-05-08 00:30:05,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:30:05,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-4.39) for latency ExtremeSparseL4U32
2025-05-08 00:30:05,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:30:05,516 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:30:05,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 25 minutes, 27 seconds)
2025-05-08 00:32:51,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:33:04,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1.68860 ± 94.028
2025-05-08 00:33:04,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-133.6144, 91.54139, 162.55524, 57.69088, -78.2446, -78.85201, -107.24028, 91.59273, -3.2246068, 14.6816435]
2025-05-08 00:33:04,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:33:04,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (1.69) for latency ExtremeSparseL4U32
2025-05-08 00:33:04,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:33:04,589 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:33:04,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 22 minutes, 34 seconds)
2025-05-08 00:35:49,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:36:02,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 139.89607 ± 123.384
2025-05-08 00:36:02,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [205.65923, 185.35605, 294.146, 232.33836, -131.92517, 40.374718, 188.14287, 94.04884, 33.106514, 257.71323]
2025-05-08 00:36:02,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:36:02,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (139.90) for latency ExtremeSparseL4U32
2025-05-08 00:36:02,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:36:02,308 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:36:02,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 19 minutes, 12 seconds)
2025-05-08 00:38:47,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:39:00,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 69.88096 ± 137.487
2025-05-08 00:39:00,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [232.90324, -61.796665, 181.62997, -181.62355, 205.11363, 71.631645, 153.88492, 107.73139, 119.71779, -130.38283]
2025-05-08 00:39:00,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:39:00,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 16 minutes, 5 seconds)
2025-05-08 00:41:50,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:42:03,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 144.80954 ± 166.965
2025-05-08 00:42:03,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [10.184868, 163.19264, 367.528, 365.42334, 63.08933, 337.43323, 251.22916, 21.773914, -104.68656, -27.072575]
2025-05-08 00:42:03,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:42:03,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (144.81) for latency ExtremeSparseL4U32
2025-05-08 00:42:03,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:42:03,299 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:42:03,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 14 minutes, 6 seconds)
2025-05-08 00:44:52,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:45:05,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 206.21297 ± 263.303
2025-05-08 00:45:05,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [614.0371, 101.81913, 425.97275, 330.37723, -76.58778, -63.891716, 450.22818, 407.5237, -227.49434, 100.145256]
2025-05-08 00:45:05,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:45:05,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (206.21) for latency ExtremeSparseL4U32
2025-05-08 00:45:05,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:45:05,279 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:45:05,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 11 minutes, 56 seconds)
2025-05-08 00:47:47,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:48:00,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 206.09818 ± 239.299
2025-05-08 00:48:00,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [33.936607, 309.309, 472.8242, 478.65292, -55.033157, 218.14941, -217.27309, 492.85736, 327.37692, 0.18173946]
2025-05-08 00:48:00,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:48:00,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 7 minutes, 43 seconds)
2025-05-08 00:50:41,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:50:53,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 419.79941 ± 153.082
2025-05-08 00:50:53,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [199.70549, 326.37375, 556.19824, 395.6039, 310.34924, 176.30447, 598.4896, 568.66254, 591.99506, 474.31204]
2025-05-08 00:50:53,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:50:53,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (419.80) for latency ExtremeSparseL4U32
2025-05-08 00:50:53,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:50:53,711 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:50:53,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 3 minutes, 38 seconds)
2025-05-08 00:53:38,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:53:51,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 499.95490 ± 284.967
2025-05-08 00:53:51,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [649.725, -77.05988, 856.57904, 565.29224, 762.3534, 412.56744, 384.02228, 582.6855, 762.70544, 100.6781]
2025-05-08 00:53:51,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:53:51,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (499.95) for latency ExtremeSparseL4U32
2025-05-08 00:53:51,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:53:51,499 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:53:51,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 31 seconds)
2025-05-08 00:56:36,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:56:49,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 478.29599 ± 335.059
2025-05-08 00:56:49,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [868.6271, 676.8149, 429.0687, 775.464, 651.92993, 69.395134, -27.169706, 708.4968, 690.0565, -59.72382]
2025-05-08 00:56:49,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:56:49,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 56 minutes, 14 seconds)
2025-05-08 00:59:34,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:59:46,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 640.47717 ± 321.687
2025-05-08 00:59:46,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [92.82039, 839.80145, 497.3776, 527.752, 988.2312, 530.33966, 982.0533, 1058.3638, 727.2307, 160.80226]
2025-05-08 00:59:46,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:59:46,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (640.48) for latency ExtremeSparseL4U32
2025-05-08 00:59:46,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 00:59:46,817 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 00:59:46,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 52 minutes, 8 seconds)
2025-05-08 01:02:31,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:02:44,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 860.22968 ± 393.055
2025-05-08 01:02:44,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1017.31366, 921.2407, 1052.4764, 188.05826, 926.8667, 806.666, 1119.6459, 74.76975, 1113.3947, 1381.8647]
2025-05-08 01:02:44,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:02:44,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (860.23) for latency ExtremeSparseL4U32
2025-05-08 01:02:44,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:02:44,570 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:02:44,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 49 minutes, 59 seconds)
2025-05-08 01:05:29,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:05:42,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 939.39832 ± 594.572
2025-05-08 01:05:42,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1568.9705, 1446.049, 74.47526, 1240.8938, 39.2856, 1422.3984, 57.861214, 1193.3666, 1022.57153, 1328.111]
2025-05-08 01:05:42,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:05:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (939.40) for latency ExtremeSparseL4U32
2025-05-08 01:05:42,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:05:42,352 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:05:42,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 48 minutes, 5 seconds)
2025-05-08 01:08:27,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:08:40,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1052.30896 ± 569.851
2025-05-08 01:08:40,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1332.8965, 1376.3195, 69.71063, 1515.205, 1472.4259, 1378.7931, 793.22375, -108.11826, 1262.2803, 1430.3533]
2025-05-08 01:08:40,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:08:40,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (1052.31) for latency ExtremeSparseL4U32
2025-05-08 01:08:40,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:08:40,072 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:08:40,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 45 minutes, 6 seconds)
2025-05-08 01:11:24,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:11:37,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1571.26196 ± 168.538
2025-05-08 01:11:37,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1674.3967, 1870.4408, 1358.315, 1647.5498, 1359.7131, 1462.7251, 1472.1138, 1434.8596, 1769.7611, 1662.7458]
2025-05-08 01:11:37,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:11:37,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (1571.26) for latency ExtremeSparseL4U32
2025-05-08 01:11:37,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:11:37,526 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:11:37,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 42 minutes, 4 seconds)
2025-05-08 01:14:22,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:14:35,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1650.34277 ± 158.253
2025-05-08 01:14:35,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1703.8097, 1537.0192, 1784.5967, 1401.6488, 1595.4434, 1737.9486, 1591.8154, 1751.0184, 1449.3925, 1950.735]
2025-05-08 01:14:35,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:14:35,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (1650.34) for latency ExtremeSparseL4U32
2025-05-08 01:14:35,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:14:35,046 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:14:35,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 39 minutes, 5 seconds)
2025-05-08 01:17:20,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:17:33,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1815.82690 ± 197.117
2025-05-08 01:17:33,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1976.6776, 1893.2753, 1809.4181, 2078.8674, 1475.027, 1933.295, 2017.4011, 1604.6785, 1822.5829, 1547.0464]
2025-05-08 01:17:33,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:17:33,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (1815.83) for latency ExtremeSparseL4U32
2025-05-08 01:17:33,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:17:33,138 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:17:33,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 36 minutes, 13 seconds)
2025-05-08 01:20:17,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:20:30,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1395.61096 ± 480.133
2025-05-08 01:20:30,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1756.3029, 621.18494, 1203.7197, 1772.8358, 1906.4895, 1405.8132, 1829.6068, 1264.2833, 484.583, 1711.2903]
2025-05-08 01:20:30,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:20:30,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 33 minutes, 9 seconds)
2025-05-08 01:23:15,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:23:28,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2006.35608 ± 164.431
2025-05-08 01:23:28,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1908.5211, 2020.2784, 2250.6416, 1838.9382, 1818.525, 1915.2654, 2050.2068, 2224.2654, 2217.9932, 1818.9231]
2025-05-08 01:23:28,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:23:28,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2006.36) for latency ExtremeSparseL4U32
2025-05-08 01:23:28,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:23:28,166 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:23:28,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 30 minutes, 10 seconds)
2025-05-08 01:26:13,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:26:25,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1587.80823 ± 330.946
2025-05-08 01:26:25,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1938.3702, 1687.082, 812.2522, 1553.0958, 1611.8855, 1654.5221, 2005.0524, 1497.0875, 1851.634, 1267.1005]
2025-05-08 01:26:25,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:26:25,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 27 minutes, 17 seconds)
2025-05-08 01:29:10,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:29:23,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1867.37732 ± 278.229
2025-05-08 01:29:23,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1765.8087, 1833.0995, 1261.504, 1975.611, 1931.2161, 2248.7969, 2038.3278, 2225.48, 1793.7228, 1600.2062]
2025-05-08 01:29:23,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:29:23,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 24 minutes, 18 seconds)
2025-05-08 01:32:08,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:32:21,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1613.89490 ± 655.662
2025-05-08 01:32:21,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2234.9912, 1400.6095, 2327.2664, 970.71844, 247.26471, 2312.173, 1649.2421, 1045.4496, 1932.2216, 2019.013]
2025-05-08 01:32:21,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:32:21,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 21 minutes, 15 seconds)
2025-05-08 01:35:03,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:35:15,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2022.53357 ± 279.774
2025-05-08 01:35:15,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2155.262, 2254.808, 2297.7786, 1657.6223, 2130.5828, 1359.7119, 1970.8271, 2043.8083, 2170.277, 2184.6592]
2025-05-08 01:35:15,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:35:15,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2022.53) for latency ExtremeSparseL4U32
2025-05-08 01:35:15,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 01:35:15,603 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 01:35:15,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 17 minutes, 40 seconds)
2025-05-08 01:37:57,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:38:10,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1550.44336 ± 692.668
2025-05-08 01:38:10,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2262.097, 1675.9181, 1942.6868, 278.8959, 1827.117, 2124.101, 1333.4078, 2085.8535, 1739.2, 235.15645]
2025-05-08 01:38:10,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:38:10,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 14 minutes, 1 second)
2025-05-08 01:40:51,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:41:04,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2009.76624 ± 318.191
2025-05-08 01:41:04,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2313.7034, 2385.1113, 1706.4348, 1797.7594, 1389.7909, 2129.4106, 1991.9633, 2297.7627, 2324.0251, 1761.7003]
2025-05-08 01:41:04,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:41:04,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 10 minutes, 15 seconds)
2025-05-08 01:43:46,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:43:58,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1779.76013 ± 398.175
2025-05-08 01:43:58,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2371.8254, 2269.9275, 1627.3745, 1868.2201, 1684.4169, 1813.9799, 925.463, 2084.7852, 1422.0881, 1729.522]
2025-05-08 01:43:58,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:43:58,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 6 minutes, 47 seconds)
2025-05-08 01:46:43,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:46:56,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1565.51941 ± 513.628
2025-05-08 01:46:56,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2179.1277, 1878.5988, 2342.628, 1585.7816, 1704.368, 1365.8237, 1335.5021, 1343.9885, 1535.9188, 383.45663]
2025-05-08 01:46:56,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:46:56,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 3 minutes, 51 seconds)
2025-05-08 01:49:41,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:49:53,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1879.31812 ± 304.164
2025-05-08 01:49:53,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1915.6016, 2087.713, 2103.6284, 1500.7047, 2095.296, 1794.6051, 1266.821, 1640.263, 2160.8677, 2227.6824]
2025-05-08 01:49:53,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:49:53,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 1 minute, 30 seconds)
2025-05-08 01:52:38,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:52:50,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1649.64844 ± 549.046
2025-05-08 01:52:50,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1609.3439, 1858.1223, 136.40933, 2011.3235, 1328.0887, 1659.4673, 2031.6316, 2060.477, 1854.0344, 1947.5865]
2025-05-08 01:52:50,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:52:50,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 59 minutes, 5 seconds)
2025-05-08 01:55:35,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:55:47,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1961.60486 ± 343.450
2025-05-08 01:55:47,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2330.8345, 2153.03, 2343.856, 1340.487, 2275.9426, 1985.3632, 1918.9175, 1354.2291, 2004.4066, 1908.983]
2025-05-08 01:55:47,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:55:47,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 56 minutes, 44 seconds)
2025-05-08 01:58:32,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:58:45,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1864.85474 ± 291.067
2025-05-08 01:58:45,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1381.7206, 2225.698, 2022.8466, 1703.0278, 1601.763, 2412.6409, 1873.9497, 1621.1996, 1922.8328, 1882.8699]
2025-05-08 01:58:45,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:58:45,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 54 minutes, 18 seconds)
2025-05-08 02:01:30,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:01:43,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1830.16528 ± 271.530
2025-05-08 02:01:43,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1979.5919, 1775.357, 1982.5211, 1803.3291, 2203.6057, 1778.2318, 1897.3508, 2068.8254, 1656.6954, 1156.1447]
2025-05-08 02:01:43,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:01:43,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 51 minutes, 22 seconds)
2025-05-08 02:04:29,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:04:42,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1999.92932 ± 337.515
2025-05-08 02:04:42,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2296.9128, 2410.7573, 2232.0854, 1418.0311, 2241.0217, 2018.21, 1561.7856, 1903.0653, 1622.2054, 2295.2183]
2025-05-08 02:04:42,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:04:42,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 48 minutes, 51 seconds)
2025-05-08 02:07:32,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:07:45,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1960.28198 ± 330.062
2025-05-08 02:07:45,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2475.9558, 2184.7766, 1962.3896, 1417.2036, 2440.4438, 1601.9927, 1808.9806, 2024.1349, 1663.2373, 2023.7076]
2025-05-08 02:07:45,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:07:45,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 46 minutes, 58 seconds)
2025-05-08 02:10:33,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:10:45,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2067.51758 ± 210.152
2025-05-08 02:10:45,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1773.3037, 1738.411, 2063.524, 2339.201, 1964.8845, 2279.1162, 2069.3162, 1930.1405, 2371.9983, 2145.282]
2025-05-08 02:10:45,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:10:45,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2067.52) for latency ExtremeSparseL4U32
2025-05-08 02:10:45,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 02:10:45,995 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:10:46,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 44 minutes, 40 seconds)
2025-05-08 02:13:31,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:13:44,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1979.60193 ± 269.308
2025-05-08 02:13:44,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2086.8845, 1592.2639, 2191.1416, 2121.1838, 2466.267, 1893.562, 1783.7551, 2010.8182, 2114.5542, 1535.5894]
2025-05-08 02:13:44,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:13:44,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 41 minutes, 54 seconds)
2025-05-08 02:16:30,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:16:42,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2026.71875 ± 260.363
2025-05-08 02:16:42,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2121.4155, 1712.0933, 2294.0403, 1833.4573, 2174.8508, 1917.706, 2526.1567, 1609.2019, 1998.9486, 2079.3174]
2025-05-08 02:16:42,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:16:42,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 38 minutes, 59 seconds)
2025-05-08 02:19:27,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:19:40,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2120.09717 ± 269.110
2025-05-08 02:19:40,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2267.1455, 2089.2131, 1990.2118, 1632.3896, 1935.1969, 2327.3713, 2184.4438, 1814.6957, 2399.3516, 2560.9517]
2025-05-08 02:19:40,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:19:40,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2120.10) for latency ExtremeSparseL4U32
2025-05-08 02:19:40,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 02:19:40,484 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:19:40,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 35 minutes, 37 seconds)
2025-05-08 02:22:25,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:22:37,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1937.02795 ± 341.490
2025-05-08 02:22:37,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2153.1047, 1654.6055, 2310.6353, 2455.0369, 2309.5232, 1971.8263, 1862.9557, 1702.8636, 1521.295, 1428.434]
2025-05-08 02:22:37,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:22:37,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 31 minutes, 43 seconds)
2025-05-08 02:25:22,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:25:35,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1912.29260 ± 448.827
2025-05-08 02:25:35,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1190.3586, 2294.702, 2056.1562, 2318.4824, 1340.7, 2238.731, 1963.9298, 1418.3158, 2579.5159, 1722.0336]
2025-05-08 02:25:35,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:25:35,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 28 minutes, 15 seconds)
2025-05-08 02:28:20,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:28:33,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2171.47241 ± 251.855
2025-05-08 02:28:33,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2369.562, 2277.2268, 1997.5782, 2319.0613, 2242.1655, 1618.6185, 1879.8738, 2281.555, 2521.8914, 2207.1936]
2025-05-08 02:28:33,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:28:33,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2171.47) for latency ExtremeSparseL4U32
2025-05-08 02:28:33,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 02:28:33,191 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:28:33,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 25 minutes, 7 seconds)
2025-05-08 02:31:17,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:31:30,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1912.59729 ± 320.064
2025-05-08 02:31:30,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2642.1084, 1944.1013, 1682.7676, 1937.7379, 1722.3715, 1815.0634, 2012.6427, 1435.8486, 1693.7806, 2239.5513]
2025-05-08 02:31:30,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:31:30,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 22 minutes, 1 second)
2025-05-08 02:34:15,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:34:28,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1978.84058 ± 246.812
2025-05-08 02:34:28,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1897.2566, 2253.851, 2002.5472, 1811.5681, 2230.9673, 1970.822, 2387.4937, 1482.441, 1879.4725, 1871.9884]
2025-05-08 02:34:28,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:34:28,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 19 minutes, 7 seconds)
2025-05-08 02:37:13,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:37:26,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2171.72803 ± 195.098
2025-05-08 02:37:26,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2160.8315, 2220.578, 1820.204, 1958.5381, 2217.1345, 2479.2139, 2323.0605, 2423.2725, 2050.125, 2064.3232]
2025-05-08 02:37:26,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:37:26,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2171.73) for latency ExtremeSparseL4U32
2025-05-08 02:37:26,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 02:37:26,275 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:37:26,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 16 minutes, 13 seconds)
2025-05-08 02:40:11,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:40:24,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2213.12280 ± 293.583
2025-05-08 02:40:24,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1658.1348, 2315.998, 2231.4346, 2825.4426, 2442.6543, 2115.742, 2060.7224, 2376.5942, 2006.5865, 2097.9197]
2025-05-08 02:40:24,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:40:24,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2213.12) for latency ExtremeSparseL4U32
2025-05-08 02:40:24,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 02:40:24,456 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:40:24,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 13 minutes, 20 seconds)
2025-05-08 02:43:10,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:43:22,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2003.17151 ± 190.067
2025-05-08 02:43:22,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1977.227, 2222.351, 2023.5956, 2156.2222, 2010.9421, 1907.2765, 1921.8223, 1529.0071, 2211.1013, 2072.17]
2025-05-08 02:43:22,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:43:22,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 10 minutes, 27 seconds)
2025-05-08 02:46:11,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:46:24,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1636.82874 ± 688.594
2025-05-08 02:46:24,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2053.285, 2379.1594, 1948.3645, 2173.6401, 1996.299, 1937.779, 478.8829, 243.73822, 1329.7812, 1827.3573]
2025-05-08 02:46:24,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:46:24,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 8 minutes, 10 seconds)
2025-05-08 02:49:14,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:49:28,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2174.94995 ± 239.712
2025-05-08 02:49:28,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2123.5535, 1730.65, 2311.225, 2612.3276, 2309.0225, 2253.0132, 2070.9688, 1975.7246, 1969.3206, 2393.6948]
2025-05-08 02:49:28,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:49:28,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 5 minutes, 56 seconds)
2025-05-08 02:52:16,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:52:29,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2110.65625 ± 361.846
2025-05-08 02:52:29,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2331.2896, 1376.9398, 2328.4773, 2419.7666, 2251.2754, 2066.6987, 2441.9885, 2467.791, 1687.7449, 1734.5911]
2025-05-08 02:52:29,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:52:29,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 3 minutes, 23 seconds)
2025-05-08 02:55:16,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:55:30,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1875.01526 ± 663.061
2025-05-08 02:55:30,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2080.6143, 1747.2032, 1723.8708, 2565.9941, 109.8533, 1925.2059, 2647.4368, 2224.6948, 1868.3508, 1856.9286]
2025-05-08 02:55:30,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:55:30,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 44 seconds)
2025-05-08 02:58:16,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:58:30,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2096.97754 ± 248.171
2025-05-08 02:58:30,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2033.007, 1773.5338, 2585.0852, 2071.9114, 2240.1523, 1672.2075, 2072.1025, 2161.2173, 2336.4077, 2024.1512]
2025-05-08 02:58:30,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:58:30,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 57 minutes, 57 seconds)
2025-05-08 03:01:18,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:01:31,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2194.52319 ± 403.974
2025-05-08 03:01:31,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2725.3645, 2747.281, 2343.387, 1870.4652, 2170.5508, 2177.7744, 2445.5413, 2002.7347, 2171.072, 1291.0603]
2025-05-08 03:01:31,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:01:31,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 54 minutes, 49 seconds)
2025-05-08 03:04:13,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:04:25,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1831.18396 ± 454.853
2025-05-08 03:04:25,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1221.4856, 2036.3933, 2228.1396, 2377.838, 2039.349, 1611.0448, 1646.9387, 2441.6853, 1704.9963, 1003.9683]
2025-05-08 03:04:25,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:04:25,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 50 minutes, 41 seconds)
2025-05-08 03:07:06,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:07:18,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2039.11914 ± 247.319
2025-05-08 03:07:18,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1796.5231, 2145.9663, 2291.3972, 2028.5226, 1810.2166, 1876.57, 2337.341, 2163.6619, 2347.9612, 1593.0322]
2025-05-08 03:07:18,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:07:18,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 46 minutes, 46 seconds)
2025-05-08 03:10:00,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:10:13,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2349.48682 ± 213.909
2025-05-08 03:10:13,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2401.4185, 2226.7463, 2224.1875, 2203.125, 2648.0964, 2665.05, 1976.98, 2335.1555, 2591.9136, 2222.1948]
2025-05-08 03:10:13,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:10:13,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2349.49) for latency ExtremeSparseL4U32
2025-05-08 03:10:13,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 03:10:13,079 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:10:13,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 43 minutes, 1 second)
2025-05-08 03:12:53,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:13:06,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2241.81055 ± 308.343
2025-05-08 03:13:06,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2282.052, 1785.3246, 2585.6917, 2074.4795, 2545.163, 1779.0215, 2020.86, 2244.4087, 2706.8154, 2394.29]
2025-05-08 03:13:06,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:13:06,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 39 minutes, 17 seconds)
2025-05-08 03:15:46,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:15:58,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2150.64014 ± 478.734
2025-05-08 03:15:58,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2244.9058, 2557.758, 1350.9939, 1852.4552, 1427.0447, 2538.0046, 2814.6003, 2080.8577, 1994.1292, 2645.6497]
2025-05-08 03:15:58,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:15:58,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 35 minutes, 25 seconds)
2025-05-08 03:18:40,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:18:52,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2025.57397 ± 429.212
2025-05-08 03:18:52,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1907.6508, 2458.7522, 2607.9905, 2369.7468, 1253.8182, 1406.7617, 2238.8584, 2306.3828, 1751.6636, 1954.1149]
2025-05-08 03:18:52,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:18:52,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 32 minutes, 30 seconds)
2025-05-08 03:21:36,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:21:48,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2277.46729 ± 186.478
2025-05-08 03:21:48,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2125.594, 2258.025, 2552.21, 2202.923, 2255.728, 1869.7821, 2458.4424, 2332.9268, 2476.3223, 2242.721]
2025-05-08 03:21:48,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:21:48,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 29 minutes, 52 seconds)
2025-05-08 03:24:31,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:24:44,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2072.50293 ± 326.905
2025-05-08 03:24:44,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1520.9838, 2088.7622, 2343.6372, 2648.692, 2166.4387, 2129.6462, 1562.156, 2067.3633, 1892.7817, 2304.5679]
2025-05-08 03:24:44,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:24:44,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 27 minutes, 7 seconds)
2025-05-08 03:27:27,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:27:39,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2145.36841 ± 337.837
2025-05-08 03:27:39,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2123.5352, 2279.0728, 2479.4407, 2321.5227, 1919.8456, 2179.7551, 1314.8688, 2335.8335, 2550.2734, 1949.5349]
2025-05-08 03:27:39,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:27:39,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 24 minutes, 26 seconds)
2025-05-08 03:30:23,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:30:35,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2309.34912 ± 184.170
2025-05-08 03:30:35,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2416.8667, 2553.798, 2275.589, 2418.9663, 2153.9907, 2294.9775, 2622.0378, 2233.5183, 2108.5442, 2015.1985]
2025-05-08 03:30:35,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:30:35,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-05-08 03:33:19,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:33:32,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1972.01953 ± 531.488
2025-05-08 03:33:32,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2183.1575, 1922.1187, 2666.999, 765.5334, 1618.6649, 2179.8135, 2061.6392, 2390.66, 1471.9861, 2459.6235]
2025-05-08 03:33:32,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:33:32,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 19 minutes, 8 seconds)
2025-05-08 03:36:15,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:36:28,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2005.66016 ± 654.600
2025-05-08 03:36:28,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2220.4802, 2081.437, 218.65773, 2238.243, 2534.9326, 2399.4717, 2530.1792, 1964.475, 2297.3, 1571.4255]
2025-05-08 03:36:28,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:36:28,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 16 minutes, 12 seconds)
2025-05-08 03:39:08,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:39:21,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2155.09351 ± 279.672
2025-05-08 03:39:21,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2082.2031, 1964.099, 2379.4102, 2280.8293, 2152.8865, 1896.6992, 2295.7012, 1885.5667, 2791.046, 1822.494]
2025-05-08 03:39:21,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:39:21,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 13 minutes, 4 seconds)
2025-05-08 03:42:01,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:42:14,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2160.49512 ± 300.199
2025-05-08 03:42:14,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2304.0203, 1876.2916, 2140.1602, 2753.2148, 2060.1494, 1863.4856, 2659.6973, 1940.5189, 2042.7568, 1964.6545]
2025-05-08 03:42:14,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:42:14,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 9 minutes, 56 seconds)
2025-05-08 03:44:54,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:45:06,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2440.08838 ± 247.331
2025-05-08 03:45:06,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2348.7034, 2281.188, 1947.7245, 2406.4897, 2665.8145, 2740.9958, 2200.9097, 2451.1611, 2782.9692, 2574.928]
2025-05-08 03:45:06,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:45:06,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2440.09) for latency ExtremeSparseL4U32
2025-05-08 03:45:06,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 03:45:06,551 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:45:06,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 6 minutes, 45 seconds)
2025-05-08 03:47:47,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:47:59,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2113.93579 ± 343.132
2025-05-08 03:47:59,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2185.8428, 1635.0883, 2366.6626, 2112.4858, 2540.82, 1634.7721, 1775.0371, 2436.9695, 1891.939, 2559.739]
2025-05-08 03:47:59,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:47:59,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 3 minutes, 37 seconds)
2025-05-08 03:50:40,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:50:53,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2220.25586 ± 659.886
2025-05-08 03:50:53,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2645.5007, 2157.3865, 2550.4485, 2050.9998, 376.01205, 2510.0183, 2878.544, 2546.942, 2189.951, 2296.7566]
2025-05-08 03:50:53,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:50:53,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 34 seconds)
2025-05-08 03:53:34,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:53:47,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1956.75269 ± 485.352
2025-05-08 03:53:47,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2172.7224, 2178.4954, 1336.409, 1157.4576, 2303.6277, 2474.0464, 2405.6821, 2181.31, 2148.919, 1208.8596]
2025-05-08 03:53:47,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:53:47,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 81/100 (estimated time remaining: 57 minutes, 44 seconds)
2025-05-08 03:56:29,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:56:41,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2311.00195 ± 285.795
2025-05-08 03:56:41,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2308.262, 2710.1616, 2440.1067, 2277.3655, 2703.34, 1991.2047, 2485.9165, 1766.2076, 2077.4727, 2349.981]
2025-05-08 03:56:41,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:56:41,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 82/100 (estimated time remaining: 54 minutes, 57 seconds)
2025-05-08 03:59:24,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:59:36,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2330.91699 ± 245.359
2025-05-08 03:59:36,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2609.4966, 2514.3638, 2277.631, 2034.1182, 2196.2505, 2209.6394, 2316.8508, 2366.9915, 2815.02, 1968.8085]
2025-05-08 03:59:36,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 03:59:36,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 83/100 (estimated time remaining: 52 minutes, 12 seconds)
2025-05-08 04:02:18,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:02:31,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2200.32129 ± 466.679
2025-05-08 04:02:31,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2602.4475, 2243.416, 2272.6729, 2377.9272, 2954.629, 2061.1028, 2063.1824, 2345.4045, 1051.0676, 2031.3636]
2025-05-08 04:02:31,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:02:31,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 84/100 (estimated time remaining: 49 minutes, 23 seconds)
2025-05-08 04:05:13,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:05:26,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2153.60693 ± 644.609
2025-05-08 04:05:26,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1765.2822, 2387.0955, 516.807, 2460.0264, 2885.6973, 2790.8105, 2529.6948, 2238.813, 1865.6866, 2096.1536]
2025-05-08 04:05:26,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:05:26,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 85/100 (estimated time remaining: 46 minutes, 33 seconds)
2025-05-08 04:08:08,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:08:21,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2221.44556 ± 540.086
2025-05-08 04:08:21,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1528.6761, 2637.187, 2713.0745, 2368.3037, 1922.4976, 2026.8699, 2514.1865, 2928.339, 2462.913, 1112.4084]
2025-05-08 04:08:21,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:08:21,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 86/100 (estimated time remaining: 43 minutes, 40 seconds)
2025-05-08 04:11:03,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:11:15,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2285.89990 ± 275.030
2025-05-08 04:11:15,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1917.314, 2224.2153, 1937.5753, 2605.3347, 2495.0913, 2360.5405, 2610.1536, 2334.285, 2516.719, 1857.7715]
2025-05-08 04:11:15,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:11:15,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 87/100 (estimated time remaining: 40 minutes, 47 seconds)
2025-05-08 04:13:58,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:14:10,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2336.26099 ± 411.053
2025-05-08 04:14:10,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1699.6307, 2363.1003, 2879.752, 2947.3547, 2705.5693, 1745.0491, 2303.789, 2379.2114, 2357.2246, 1981.9274]
2025-05-08 04:14:10,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:14:10,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 53 seconds)
2025-05-08 04:16:53,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:17:06,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2122.20874 ± 810.184
2025-05-08 04:17:06,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2628.1406, 2623.0005, 2825.3792, 2425.6943, 2324.2546, 1258.7705, 2519.9324, 2215.5059, 13.462452, 2387.9448]
2025-05-08 04:17:06,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:17:06,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 59 seconds)
2025-05-08 04:19:48,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:20:01,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2518.69482 ± 227.689
2025-05-08 04:20:01,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2068.1982, 2508.5562, 2768.1738, 2623.6912, 2457.4814, 2950.9934, 2565.6753, 2516.8528, 2339.1501, 2388.1765]
2025-05-08 04:20:01,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:20:01,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2518.69) for latency ExtremeSparseL4U32
2025-05-08 04:20:01,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 04:20:01,363 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:20:01,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 4 seconds)
2025-05-08 04:22:43,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:22:55,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2274.01489 ± 766.024
2025-05-08 04:22:55,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2709.2998, 2448.994, 2488.592, 2646.5747, 2364.11, 2190.1003, 2656.4067, 2674.275, 22.1615, 2539.6318]
2025-05-08 04:22:55,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:22:55,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 91/100 (estimated time remaining: 29 minutes, 9 seconds)
2025-05-08 04:25:38,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:25:50,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2491.47095 ± 280.108
2025-05-08 04:25:50,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2954.7742, 2581.3145, 1965.8726, 2780.3726, 2678.3215, 2324.4622, 2695.489, 2295.6235, 2324.4155, 2314.0618]
2025-05-08 04:25:50,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:25:50,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 14 seconds)
2025-05-08 04:28:33,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:28:45,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2478.23071 ± 203.360
2025-05-08 04:28:45,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2786.0881, 2661.507, 2199.6953, 2585.8647, 2452.9697, 2493.894, 2218.1829, 2738.3313, 2399.0269, 2246.7456]
2025-05-08 04:28:45,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:28:45,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 19 seconds)
2025-05-08 04:31:28,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:31:40,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2590.99829 ± 268.437
2025-05-08 04:31:40,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3044.7747, 2780.7327, 2610.9084, 2558.4539, 2231.4502, 2452.5947, 2307.4731, 2739.0117, 2928.809, 2255.7751]
2025-05-08 04:31:40,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:31:40,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (2591.00) for latency ExtremeSparseL4U32
2025-05-08 04:31:40,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-08 04:31:40,758 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:31:40,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 24 seconds)
2025-05-08 04:34:25,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:34:37,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2263.55811 ± 778.407
2025-05-08 04:34:37,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2349.6187, 2845.6199, 1998.6835, 2827.085, 1935.9875, 2715.3164, 2720.8157, 2200.509, 2880.741, 161.20337]
2025-05-08 04:34:37,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:34:37,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 31 seconds)
2025-05-08 04:37:21,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:37:34,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2352.61279 ± 344.212
2025-05-08 04:37:34,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2811.9895, 2453.8289, 2304.1255, 1975.6405, 2664.6304, 2661.501, 2032.8094, 2136.7305, 2718.4407, 1766.4291]
2025-05-08 04:37:34,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:37:34,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 38 seconds)
2025-05-08 04:40:18,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:40:30,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2477.02173 ± 333.677
2025-05-08 04:40:30,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2688.3213, 1983.832, 2662.46, 2734.84, 2639.436, 2463.4636, 2905.7156, 2327.3489, 1783.4539, 2581.346]
2025-05-08 04:40:30,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:40:30,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 44 seconds)
2025-05-08 04:43:13,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:43:26,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2321.43188 ± 765.240
2025-05-08 04:43:26,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2729.9333, 2690.3923, 2511.0996, 2631.6772, 2763.2612, 2000.014, 2842.0618, 2484.7754, 127.8235, 2433.2786]
2025-05-08 04:43:26,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:43:26,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 48 seconds)
2025-05-08 04:46:14,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:46:26,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2364.73389 ± 605.087
2025-05-08 04:46:26,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2799.5645, 2683.73, 2719.4792, 2906.005, 2869.4016, 1028.5347, 2461.9043, 2258.5461, 2490.8076, 1429.3646]
2025-05-08 04:46:26,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:46:26,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 54 seconds)
2025-05-08 04:49:13,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:49:26,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2490.07349 ± 374.417
2025-05-08 04:49:26,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2949.8733, 2510.405, 2530.7053, 3025.7307, 2611.5852, 2720.3242, 1878.0631, 1818.5848, 2412.5254, 2442.9385]
2025-05-08 04:49:26,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:49:26,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 57 seconds)
2025-05-08 04:52:13,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:52:25,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2218.46265 ± 624.089
2025-05-08 04:52:25,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1952.8188, 1349.0928, 2735.6543, 2470.8525, 852.4498, 2790.6753, 2576.3308, 2143.3147, 2542.626, 2770.8145]
2025-05-08 04:52:25,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 04:52:25,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1149 [DEBUG]: Training session finished
