2025-05-06 07:40:03,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32
2025-05-06 07:40:03,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32
2025-05-06 07:40:03,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1008 [DEBUG]: args.trainer_eval_latencies: {'SparseU15': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x79ad0c3c7d00>}
2025-05-06 07:40:03,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1009 [DEBUG]: using device: cpu
2025-05-06 07:40:03,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1031 [INFO]: Creating new trainer
2025-05-06 07:40:03,296 baseline-sac-noisy-halfcheetah:105 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-06 07:40:03,296 baseline-sac-noisy-halfcheetah:106 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=215, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 07:40:03,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1092 [DEBUG]: Starting training session...
2025-05-06 07:40:03,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 1/100
2025-05-06 07:42:56,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:43:21,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -91.44791 ± 171.234
2025-05-06 07:43:21,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-152.95262, -53.159492, -173.18372, -162.61176, 87.76223, -294.02017, -128.48428, -268.36417, -90.80605, 321.3408]
2025-05-06 07:43:21,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:43:21,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-91.45) for latency SparseU15
2025-05-06 07:43:21,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 07:43:21,590 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 07:43:21,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 26 minutes, 49 seconds)
2025-05-06 07:46:22,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:46:47,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -364.90732 ± 71.379
2025-05-06 07:46:47,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-413.58463, -489.31073, -274.33313, -348.16922, -244.42767, -334.02786, -366.7997, -329.0271, -443.54205, -405.85117]
2025-05-06 07:46:47,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:46:47,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 30 minutes, 10 seconds)
2025-05-06 07:49:48,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:50:13,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -333.73929 ± 75.809
2025-05-06 07:50:13,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-288.4659, -340.33212, -214.88994, -345.50317, -346.44525, -332.45282, -451.43173, -274.01987, -473.45288, -270.39923]
2025-05-06 07:50:13,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:50:13,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 28 minutes, 40 seconds)
2025-05-06 07:53:13,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:53:38,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -204.28459 ± 90.768
2025-05-06 07:53:38,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-268.1608, -273.9612, -214.11453, -141.90555, -140.9148, -40.89558, -280.83304, -125.2971, -189.85356, -366.90988]
2025-05-06 07:53:38,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:53:38,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 26 minutes, 5 seconds)
2025-05-06 07:56:39,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:57:04,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -305.42792 ± 108.032
2025-05-06 07:57:04,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-157.21992, -268.8736, -430.48724, -376.43158, -410.23508, -150.12843, -158.25961, -313.82205, -399.84256, -388.97913]
2025-05-06 07:57:04,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 07:57:04,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 23 minutes, 9 seconds)
2025-05-06 08:00:04,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:00:29,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -332.15155 ± 128.589
2025-05-06 08:00:29,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-566.23157, -431.28836, -278.02704, -510.7326, -294.93637, -244.33894, -362.6869, -176.56793, -157.20053, -299.50534]
2025-05-06 08:00:29,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:00:29,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 22 minutes, 11 seconds)
2025-05-06 08:03:30,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:03:55,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -224.18843 ± 59.137
2025-05-06 08:03:55,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-300.62656, -154.9472, -113.73988, -204.55798, -322.32114, -204.24905, -222.1417, -249.21146, -214.3465, -255.74287]
2025-05-06 08:03:55,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:03:55,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 18 minutes, 39 seconds)
2025-05-06 08:06:56,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:07:21,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -329.02249 ± 107.266
2025-05-06 08:07:21,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-170.6956, -235.50673, -258.41733, -315.9169, -451.8596, -529.0427, -391.28363, -261.36972, -259.40182, -416.7308]
2025-05-06 08:07:21,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:07:21,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 15 minutes, 19 seconds)
2025-05-06 08:10:22,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:10:47,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -349.68042 ± 84.641
2025-05-06 08:10:47,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-323.45645, -391.14084, -419.32773, -182.30025, -410.81308, -317.15573, -424.83713, -215.88226, -435.02505, -376.8656]
2025-05-06 08:10:47,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:10:47,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 12 minutes, 2 seconds)
2025-05-06 08:13:48,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:14:13,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -363.61115 ± 137.062
2025-05-06 08:14:13,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-283.0686, -253.19514, -352.63293, -213.25014, -386.6374, -526.69415, -682.5777, -336.3602, -367.8, -233.8955]
2025-05-06 08:14:13,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:14:13,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 8 minutes, 46 seconds)
2025-05-06 08:17:15,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:17:40,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -339.24899 ± 142.147
2025-05-06 08:17:40,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-158.73222, -386.87952, -287.58096, -564.2425, -260.47388, -100.63841, -300.9875, -370.01443, -544.3462, -418.59433]
2025-05-06 08:17:40,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:17:40,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 5 minutes, 38 seconds)
2025-05-06 08:20:42,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:21:07,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -238.74577 ± 114.806
2025-05-06 08:21:07,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-458.5465, -148.55142, -119.03665, -241.40872, -352.14862, -130.24652, -393.31512, -221.14394, -150.12111, -172.93925]
2025-05-06 08:21:07,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:21:07,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 2 minutes, 33 seconds)
2025-05-06 08:24:08,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:24:33,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -324.36221 ± 101.865
2025-05-06 08:24:33,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-394.63858, -398.49887, -309.115, -344.14005, -287.18695, -83.71088, -275.11722, -293.69702, -492.97675, -364.54068]
2025-05-06 08:24:33,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:24:33,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 59 minutes, 12 seconds)
2025-05-06 08:27:35,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:28:00,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -286.54837 ± 79.590
2025-05-06 08:28:00,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-278.75925, -257.97498, -247.1995, -410.20282, -243.93138, -413.51303, -273.6031, -134.47844, -346.8746, -258.9466]
2025-05-06 08:28:00,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:28:00,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 56 minutes, 12 seconds)
2025-05-06 08:31:03,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:31:28,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -260.54184 ± 64.479
2025-05-06 08:31:28,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-328.9815, -255.60341, -189.04024, -277.68643, -323.00003, -226.66571, -167.57405, -227.49152, -224.75272, -384.62277]
2025-05-06 08:31:28,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:31:28,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 53 minutes, 11 seconds)
2025-05-06 08:34:29,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:34:54,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -251.80460 ± 105.307
2025-05-06 08:34:54,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-255.91586, -83.02929, -235.81767, -283.12772, -188.80734, -375.1339, -192.57274, -143.41638, -297.06073, -463.16434]
2025-05-06 08:34:54,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:34:54,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 49 minutes, 29 seconds)
2025-05-06 08:37:54,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:38:19,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -312.91510 ± 101.852
2025-05-06 08:38:19,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-355.03528, -348.47174, -202.0629, -234.24529, -461.17917, -334.87912, -463.693, -310.38498, -298.11328, -121.086296]
2025-05-06 08:38:19,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:38:19,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 45 minutes, 42 seconds)
2025-05-06 08:41:20,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:41:45,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -288.19818 ± 62.033
2025-05-06 08:41:45,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-371.0524, -356.8183, -320.93567, -189.9979, -370.44104, -243.64977, -215.74461, -241.79279, -289.58835, -281.961]
2025-05-06 08:41:45,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:41:45,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 42 minutes, 7 seconds)
2025-05-06 08:44:46,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:45:11,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -299.63525 ± 78.562
2025-05-06 08:45:11,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-189.412, -396.34848, -265.3696, -276.3647, -290.2253, -326.2289, -351.32483, -152.28586, -341.66162, -407.13144]
2025-05-06 08:45:11,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:45:11,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 38 minutes, 19 seconds)
2025-05-06 08:48:12,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:48:37,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -264.83127 ± 83.080
2025-05-06 08:48:37,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-219.31934, -380.87183, -249.55824, -344.802, -357.71942, -310.03503, -246.90317, -99.09887, -266.7866, -173.21819]
2025-05-06 08:48:37,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:48:37,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 34 minutes, 32 seconds)
2025-05-06 08:51:39,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:52:04,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -222.54825 ± 94.907
2025-05-06 08:52:04,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-68.50082, -312.31723, -213.49905, -314.08105, -271.77426, -239.77837, -290.4247, -26.246868, -205.66283, -283.19733]
2025-05-06 08:52:04,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:52:04,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 31 minutes, 18 seconds)
2025-05-06 08:55:05,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:55:30,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -304.18100 ± 78.910
2025-05-06 08:55:30,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-253.64551, -304.24408, -200.73108, -416.0086, -157.87062, -405.04337, -324.37265, -329.00436, -364.16595, -286.72372]
2025-05-06 08:55:30,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:55:30,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 28 minutes, 2 seconds)
2025-05-06 08:58:32,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 08:58:57,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -205.81519 ± 106.846
2025-05-06 08:58:57,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-279.07703, -218.2438, -223.69283, -177.11511, -397.54425, -294.707, -70.21198, -225.82138, 1.5157732, -173.25424]
2025-05-06 08:58:57,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:58:57,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 24 minutes, 53 seconds)
2025-05-06 09:01:58,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:02:24,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -204.64828 ± 86.165
2025-05-06 09:02:24,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-298.78784, -224.70302, -201.66205, 11.1891165, -254.86217, -170.31422, -296.72784, -230.17375, -139.5361, -240.90498]
2025-05-06 09:02:24,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:02:24,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 21 minutes, 36 seconds)
2025-05-06 09:05:25,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:05:50,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -215.13847 ± 55.089
2025-05-06 09:05:50,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-197.61656, -259.15967, -199.52551, -247.06812, -107.77749, -165.05774, -168.50932, -232.91322, -290.16077, -283.59644]
2025-05-06 09:05:50,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:05:50,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 18 minutes, 18 seconds)
2025-05-06 09:08:51,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:09:17,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -252.74028 ± 119.448
2025-05-06 09:09:17,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-273.04465, -156.27832, -232.58554, -196.6295, -103.47463, -411.73102, -188.3827, -476.2433, -362.02728, -127.00621]
2025-05-06 09:09:17,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:09:17,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 14 minutes, 45 seconds)
2025-05-06 09:12:17,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:12:43,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -226.46220 ± 86.548
2025-05-06 09:12:43,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-155.95639, -242.61378, -302.75693, -237.9826, -429.0121, -90.84501, -217.77158, -197.25616, -168.56787, -221.85962]
2025-05-06 09:12:43,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:12:43,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 11 minutes, 13 seconds)
2025-05-06 09:15:43,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:16:08,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -233.36221 ± 85.002
2025-05-06 09:16:08,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-164.72003, -219.23428, -351.51978, -211.9533, -274.329, -393.2333, -98.84778, -260.4111, -156.26527, -203.10803]
2025-05-06 09:16:08,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:16:08,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 7 minutes, 29 seconds)
2025-05-06 09:19:09,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:19:34,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -203.08374 ± 94.737
2025-05-06 09:19:34,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-229.36125, -199.04977, -266.5411, -286.32077, -371.83942, -60.39306, -55.113556, -251.88174, -141.35966, -168.977]
2025-05-06 09:19:34,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:19:34,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 3 minutes, 48 seconds)
2025-05-06 09:22:34,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:23:00,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -256.76508 ± 91.540
2025-05-06 09:23:00,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-464.83002, -343.91052, -284.02936, -136.99802, -230.56625, -226.42136, -215.37106, -136.90901, -260.84882, -267.76602]
2025-05-06 09:23:00,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:23:00,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 8 seconds)
2025-05-06 09:26:00,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:26:26,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -260.50391 ± 54.773
2025-05-06 09:26:26,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-227.07031, -248.14696, -274.38974, -262.62448, -311.9677, -207.48401, -269.22247, -203.6862, -209.08151, -391.36575]
2025-05-06 09:26:26,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:26:26,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 56 minutes, 39 seconds)
2025-05-06 09:29:26,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:29:51,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -226.79179 ± 82.063
2025-05-06 09:29:51,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-197.16072, -212.79213, -405.00153, -166.78796, -186.09587, -228.45232, -329.36417, -94.03693, -200.7478, -247.47849]
2025-05-06 09:29:51,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:29:51,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 53 minutes, 6 seconds)
2025-05-06 09:32:52,532 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:33:17,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -194.53325 ± 87.664
2025-05-06 09:33:17,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-240.19957, -285.22076, -232.85393, -131.51341, -207.76721, -271.46228, -267.31427, -60.797695, -19.451384, -228.75201]
2025-05-06 09:33:17,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:33:17,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 49 minutes, 47 seconds)
2025-05-06 09:36:18,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:36:43,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -178.02829 ± 92.900
2025-05-06 09:36:43,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-220.6266, 19.015606, -286.6643, -217.9977, -300.54993, -226.87935, -107.64986, -136.04836, -93.40749, -209.47507]
2025-05-06 09:36:43,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:36:43,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 46 minutes, 24 seconds)
2025-05-06 09:39:43,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:40:09,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -255.26746 ± 119.578
2025-05-06 09:40:09,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-334.9924, -505.41272, -271.44, -264.45352, -222.52586, -68.04587, -157.20265, -199.84406, -153.16483, -375.59244]
2025-05-06 09:40:09,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:40:09,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 42 minutes, 57 seconds)
2025-05-06 09:43:09,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:43:34,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -224.29178 ± 62.477
2025-05-06 09:43:34,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-312.68045, -199.23486, -197.63406, -277.53497, -242.35246, -259.18347, -135.14432, -279.91797, -104.48615, -234.74895]
2025-05-06 09:43:34,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:43:34,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 39 minutes, 25 seconds)
2025-05-06 09:46:35,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:47:00,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -159.78188 ± 73.282
2025-05-06 09:47:00,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-207.6315, -146.54703, -59.38127, -112.63499, -254.46713, -189.21222, -226.62984, -253.46414, -50.30775, -97.542854]
2025-05-06 09:47:00,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:47:00,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 36 minutes, 2 seconds)
2025-05-06 09:50:01,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:50:27,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -209.79367 ± 46.746
2025-05-06 09:50:27,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-219.96475, -285.20242, -183.54271, -276.65045, -120.45009, -170.9813, -240.03873, -202.7264, -209.78044, -188.59947]
2025-05-06 09:50:27,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:50:27,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 32 minutes, 44 seconds)
2025-05-06 09:53:28,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:53:53,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -252.81265 ± 47.170
2025-05-06 09:53:53,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-274.05783, -252.05664, -194.92598, -244.72832, -277.2074, -142.81831, -301.4966, -276.96512, -259.15222, -304.71793]
2025-05-06 09:53:53,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:53:53,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 29 minutes, 19 seconds)
2025-05-06 09:56:54,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 09:57:19,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -238.47884 ± 57.078
2025-05-06 09:57:19,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-180.41771, -172.4478, -243.7709, -232.5691, -297.7262, -164.4538, -271.8467, -331.10724, -300.0422, -190.4065]
2025-05-06 09:57:19,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:57:19,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 26 minutes, 1 second)
2025-05-06 10:00:19,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:00:44,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -222.02426 ± 60.068
2025-05-06 10:00:44,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-260.14548, -252.53943, -231.29234, -172.45471, -273.02988, -279.8287, -294.05817, -206.4528, -102.44533, -147.99571]
2025-05-06 10:00:44,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:00:44,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 22 minutes, 32 seconds)
2025-05-06 10:03:44,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:04:09,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -234.95566 ± 25.222
2025-05-06 10:04:09,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-232.2709, -202.85526, -202.05956, -240.11835, -232.83734, -242.32895, -236.06783, -244.16934, -219.89987, -296.94922]
2025-05-06 10:04:09,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:04:09,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 18 minutes, 55 seconds)
2025-05-06 10:07:10,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:07:35,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -297.55414 ± 117.130
2025-05-06 10:07:35,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-328.91006, -196.29994, -186.51729, -75.60522, -290.54935, -311.3852, -420.1734, -466.67612, -262.96558, -436.45926]
2025-05-06 10:07:35,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:07:35,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 15 minutes, 21 seconds)
2025-05-06 10:10:35,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:11:00,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -242.57071 ± 41.355
2025-05-06 10:11:00,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-176.2171, -252.42236, -202.22897, -339.15738, -237.70758, -257.20462, -241.62721, -249.2507, -209.97758, -259.91354]
2025-05-06 10:11:00,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:11:00,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 11 minutes, 43 seconds)
2025-05-06 10:14:00,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:14:24,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -248.49451 ± 68.391
2025-05-06 10:14:24,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-283.7333, -272.71448, -222.158, -202.18004, -73.2496, -287.04822, -237.25055, -276.74472, -323.78592, -306.0804]
2025-05-06 10:14:24,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:14:24,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 8 minutes, 3 seconds)
2025-05-06 10:17:24,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:17:49,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -244.65884 ± 31.359
2025-05-06 10:17:49,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-243.39624, -272.06512, -258.53506, -240.93173, -186.13472, -218.77719, -299.36966, -253.41656, -208.7683, -265.19385]
2025-05-06 10:17:49,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:17:49,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 4 minutes, 31 seconds)
2025-05-06 10:20:49,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:21:14,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -237.66495 ± 77.000
2025-05-06 10:21:14,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-305.45404, -361.99753, -190.04253, -210.3514, -137.0711, -261.38135, -111.2962, -205.76553, -268.3905, -324.89944]
2025-05-06 10:21:14,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:21:14,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 1 minute, 6 seconds)
2025-05-06 10:24:14,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:24:39,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -283.82501 ± 35.525
2025-05-06 10:24:39,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-218.25934, -265.82697, -302.30997, -282.84448, -312.29535, -316.6903, -333.45755, -267.13153, -305.93655, -233.49814]
2025-05-06 10:24:39,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:24:39,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 57 minutes, 30 seconds)
2025-05-06 10:27:39,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:28:04,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -237.86447 ± 78.759
2025-05-06 10:28:04,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-248.26666, -294.972, -292.51276, -37.39604, -264.30322, -201.44623, -322.52686, -181.61017, -293.6499, -241.96097]
2025-05-06 10:28:04,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:28:04,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 54 minutes, 6 seconds)
2025-05-06 10:31:04,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:31:29,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -283.36240 ± 57.967
2025-05-06 10:31:29,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-245.66768, -266.38263, -369.81323, -274.73428, -262.05896, -224.62221, -258.81442, -266.37808, -246.8281, -418.32434]
2025-05-06 10:31:29,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:31:29,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 50 minutes, 49 seconds)
2025-05-06 10:34:30,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:34:55,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -288.65396 ± 85.289
2025-05-06 10:34:55,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-288.73584, -287.21448, -481.43024, -313.63293, -256.87033, -133.39766, -317.59442, -301.69965, -309.0482, -196.91599]
2025-05-06 10:34:55,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:34:55,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 47 minutes, 31 seconds)
2025-05-06 10:37:56,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:38:20,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -243.22537 ± 59.750
2025-05-06 10:38:20,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-263.73987, -274.42432, -291.5992, -341.88797, -141.82587, -163.42249, -290.37247, -256.9911, -210.74184, -197.24843]
2025-05-06 10:38:20,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:38:20,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 44 minutes, 14 seconds)
2025-05-06 10:41:21,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:41:46,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -309.48920 ± 52.565
2025-05-06 10:41:46,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-302.48645, -244.1181, -227.16864, -347.4858, -326.02115, -389.57645, -309.85004, -256.24918, -384.0774, -307.85883]
2025-05-06 10:41:46,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:41:46,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 40 minutes, 55 seconds)
2025-05-06 10:44:49,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:45:14,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -279.89615 ± 31.310
2025-05-06 10:45:14,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-268.0802, -269.05273, -220.7649, -296.4433, -274.8, -275.8179, -273.75735, -281.41583, -283.56946, -355.25986]
2025-05-06 10:45:14,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:45:14,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 37 minutes, 57 seconds)
2025-05-06 10:48:17,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:48:42,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -282.77643 ± 81.533
2025-05-06 10:48:42,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-294.17343, -375.8944, -301.2169, -223.94525, -414.13297, -118.78131, -191.69142, -290.75046, -312.61118, -304.56683]
2025-05-06 10:48:42,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:48:42,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 34 minutes, 49 seconds)
2025-05-06 10:51:44,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:52:09,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -251.71545 ± 30.490
2025-05-06 10:52:09,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-263.01807, -253.40659, -245.38861, -271.84995, -292.53888, -221.60414, -233.30432, -283.14758, -184.3562, -268.54]
2025-05-06 10:52:09,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:52:09,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 31 minutes, 44 seconds)
2025-05-06 10:55:12,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:55:37,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -270.34048 ± 77.975
2025-05-06 10:55:37,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-293.6716, -143.05917, -322.8217, -252.78511, -244.46442, -211.72987, -185.6802, -265.73853, -402.45682, -380.9973]
2025-05-06 10:55:37,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:55:37,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 28 minutes, 37 seconds)
2025-05-06 10:58:40,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 10:59:05,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -319.88834 ± 33.044
2025-05-06 10:59:05,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-338.17328, -330.745, -292.3145, -343.48532, -246.00278, -297.003, -352.81976, -362.90652, -307.16312, -328.27008]
2025-05-06 10:59:05,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:59:05,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 25 minutes, 29 seconds)
2025-05-06 11:02:09,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:02:33,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -259.10687 ± 64.968
2025-05-06 11:02:33,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-137.78203, -301.91302, -286.24164, -206.89459, -223.84955, -222.49294, -281.26877, -283.3064, -395.07394, -252.24585]
2025-05-06 11:02:33,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:02:33,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 22 minutes, 4 seconds)
2025-05-06 11:05:37,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:06:02,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -287.77029 ± 78.490
2025-05-06 11:06:02,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-221.60757, -219.65622, -222.32611, -331.8045, -225.07867, -284.51883, -364.0146, -250.90614, -475.4552, -282.3351]
2025-05-06 11:06:02,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:06:02,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 18 minutes, 44 seconds)
2025-05-06 11:09:06,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:09:31,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -274.98645 ± 91.314
2025-05-06 11:09:31,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-222.51726, -163.43758, -290.18045, -120.68389, -426.6482, -302.50928, -393.2272, -311.11023, -211.61964, -307.93094]
2025-05-06 11:09:31,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:09:31,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 15 minutes, 24 seconds)
2025-05-06 11:12:36,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:13:01,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -281.41199 ± 73.194
2025-05-06 11:13:01,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-226.63696, -213.40198, -147.50577, -359.202, -366.6062, -347.58423, -220.46884, -360.23138, -272.89838, -299.58417]
2025-05-06 11:13:01,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:13:01,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 12 minutes, 10 seconds)
2025-05-06 11:16:06,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:16:31,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -286.35538 ± 50.747
2025-05-06 11:16:31,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-295.8193, -196.36789, -297.76477, -372.98917, -315.55927, -332.92404, -309.60306, -259.62543, -269.91147, -212.98955]
2025-05-06 11:16:31,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:16:31,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 8 minutes, 57 seconds)
2025-05-06 11:19:42,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:20:07,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -296.45377 ± 30.177
2025-05-06 11:20:07,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-302.14352, -225.371, -311.41684, -294.53613, -308.30298, -350.8672, -297.2016, -311.9005, -281.66248, -281.13522]
2025-05-06 11:20:07,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:20:07,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 6 minutes, 28 seconds)
2025-05-06 11:23:08,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:23:34,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -312.21466 ± 55.684
2025-05-06 11:23:34,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-285.6453, -349.4997, -286.52493, -381.23965, -316.96637, -340.315, -387.6897, -294.29126, -184.2042, -295.7705]
2025-05-06 11:23:34,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:23:34,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 2 minutes, 41 seconds)
2025-05-06 11:26:34,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:27:00,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -255.61728 ± 54.672
2025-05-06 11:27:00,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-222.17722, -268.01062, -211.44707, -229.2571, -225.92851, -329.3287, -208.08939, -238.78891, -237.90538, -385.2399]
2025-05-06 11:27:00,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:27:00,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 58 minutes, 50 seconds)
2025-05-06 11:30:00,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:30:26,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -317.26169 ± 55.977
2025-05-06 11:30:26,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-299.6514, -286.56955, -417.84344, -383.73322, -289.3483, -296.38293, -371.82968, -337.52194, -260.0691, -229.66751]
2025-05-06 11:30:26,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:30:26,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 54 minutes, 55 seconds)
2025-05-06 11:33:27,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:33:52,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -293.88959 ± 42.919
2025-05-06 11:33:52,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-257.39124, -325.80936, -226.80255, -249.64706, -337.17038, -281.52014, -373.0125, -309.45862, -311.7946, -266.28964]
2025-05-06 11:33:52,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:33:52,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 51 minutes, 2 seconds)
2025-05-06 11:36:51,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:37:17,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -265.78912 ± 27.820
2025-05-06 11:37:17,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-265.72314, -270.59924, -287.0514, -273.30267, -261.3663, -324.0354, -252.61772, -217.82022, -274.79648, -230.57883]
2025-05-06 11:37:17,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:37:17,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 46 minutes, 22 seconds)
2025-05-06 11:40:17,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:40:42,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -248.20798 ± 54.181
2025-05-06 11:40:42,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-212.13593, -298.42224, -242.95535, -353.33014, -278.0525, -173.37178, -242.39291, -198.16681, -189.95552, -293.29672]
2025-05-06 11:40:42,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:40:42,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 42 minutes, 49 seconds)
2025-05-06 11:43:42,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:44:07,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -292.22784 ± 26.310
2025-05-06 11:44:07,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-317.78632, -295.035, -302.12372, -316.03403, -289.1172, -301.54376, -302.5612, -291.2723, -287.88107, -218.9237]
2025-05-06 11:44:07,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:44:07,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 39 minutes, 20 seconds)
2025-05-06 11:47:07,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:47:33,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -232.67679 ± 46.743
2025-05-06 11:47:33,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-205.04999, -167.612, -223.49777, -219.89494, -213.0919, -245.5254, -198.27556, -263.59567, -239.79405, -350.4305]
2025-05-06 11:47:33,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:47:33,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 35 minutes, 51 seconds)
2025-05-06 11:50:33,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:50:58,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -325.98462 ± 22.295
2025-05-06 11:50:58,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-307.9195, -333.63208, -349.8102, -337.34644, -359.3131, -331.14026, -292.57837, -331.85287, -286.18045, -330.07288]
2025-05-06 11:50:58,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:50:58,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 32 minutes, 20 seconds)
2025-05-06 11:53:58,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:54:23,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -270.24738 ± 53.578
2025-05-06 11:54:23,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-251.04733, -408.95135, -222.66713, -220.7757, -262.24692, -222.55115, -309.52173, -283.0694, -251.78435, -269.85898]
2025-05-06 11:54:23,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:54:23,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 28 minutes, 56 seconds)
2025-05-06 11:57:23,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 11:57:48,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -236.22498 ± 40.578
2025-05-06 11:57:48,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-187.19711, -180.96544, -267.416, -245.05814, -207.54092, -235.90031, -214.48581, -257.78363, -239.18527, -326.71707]
2025-05-06 11:57:48,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:57:48,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 25 minutes, 30 seconds)
2025-05-06 12:00:48,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:01:14,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -443.26642 ± 1.739
2025-05-06 12:01:14,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-444.15097, -441.98395, -444.30353, -441.22974, -442.70352, -442.02484, -445.80173, -444.57825, -440.45557, -445.43176]
2025-05-06 12:01:14,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:01:14,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 22 minutes, 6 seconds)
2025-05-06 12:04:13,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:04:39,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -258.30359 ± 55.029
2025-05-06 12:04:39,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-222.17032, -347.28796, -259.60315, -148.57155, -240.34843, -319.86218, -315.59158, -218.65012, -251.385, -259.56537]
2025-05-06 12:04:39,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:04:39,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 18 minutes, 39 seconds)
2025-05-06 12:07:38,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:08:04,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -300.13101 ± 24.682
2025-05-06 12:08:04,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-315.75287, -296.84442, -340.6017, -304.01212, -284.65698, -288.13452, -313.55637, -291.33444, -244.67102, -321.7457]
2025-05-06 12:08:04,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:08:04,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 15 minutes, 13 seconds)
2025-05-06 12:11:03,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:11:29,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -409.35510 ± 8.085
2025-05-06 12:11:29,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-392.44794, -406.60803, -407.191, -417.54526, -422.28503, -405.437, -410.04483, -405.80612, -407.39304, -418.79272]
2025-05-06 12:11:29,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:11:29,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 11 minutes, 47 seconds)
2025-05-06 12:14:28,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:14:54,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -437.81485 ± 3.098
2025-05-06 12:14:54,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-433.5923, -437.66214, -443.98468, -436.9844, -441.41858, -439.3204, -434.4029, -438.3075, -434.33356, -438.14206]
2025-05-06 12:14:54,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:14:54,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 8 minutes, 21 seconds)
2025-05-06 12:17:53,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:18:19,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 241.97031 ± 32.499
2025-05-06 12:18:19,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [176.98961, 271.77673, 300.1131, 233.13007, 236.14525, 214.15561, 235.16371, 229.22679, 251.17982, 271.82214]
2025-05-06 12:18:19,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:18:19,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (241.97) for latency SparseU15
2025-05-06 12:18:19,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:18:19,085 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:18:19,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 4 minutes, 54 seconds)
2025-05-06 12:21:18,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:21:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 355.12543 ± 26.846
2025-05-06 12:21:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [397.84296, 343.14877, 340.21832, 350.43384, 357.48355, 360.80035, 291.37524, 382.3754, 364.06445, 363.5117]
2025-05-06 12:21:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:21:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (355.13) for latency SparseU15
2025-05-06 12:21:43,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:21:43,827 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:21:43,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 1 minute, 28 seconds)
2025-05-06 12:24:43,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:25:08,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 414.06259 ± 48.561
2025-05-06 12:25:08,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [433.52084, 355.70374, 361.6903, 460.32153, 489.57117, 415.45914, 372.00888, 383.38922, 487.71677, 381.24414]
2025-05-06 12:25:08,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:25:08,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (414.06) for latency SparseU15
2025-05-06 12:25:08,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:25:08,533 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:25:08,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 84/100 (estimated time remaining: 58 minutes, 2 seconds)
2025-05-06 12:28:07,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:28:33,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 407.98413 ± 44.913
2025-05-06 12:28:33,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [432.9402, 375.8285, 437.1284, 386.79858, 395.918, 444.32437, 321.71036, 473.2655, 450.89642, 361.03085]
2025-05-06 12:28:33,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:28:33,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 85/100 (estimated time remaining: 54 minutes, 36 seconds)
2025-05-06 12:31:32,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:31:57,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 327.20819 ± 64.547
2025-05-06 12:31:57,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [204.21684, 353.0037, 316.53162, 308.65082, 276.2173, 311.88492, 314.0574, 462.46124, 334.10614, 390.9519]
2025-05-06 12:31:57,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:31:57,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 86/100 (estimated time remaining: 51 minutes, 10 seconds)
2025-05-06 12:34:57,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:35:22,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 330.53082 ± 54.802
2025-05-06 12:35:22,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [251.4826, 330.14627, 319.4905, 418.58685, 296.12128, 378.63043, 357.17178, 329.82043, 237.74095, 386.11688]
2025-05-06 12:35:22,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:35:22,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 87/100 (estimated time remaining: 47 minutes, 46 seconds)
2025-05-06 12:38:22,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:38:47,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 404.82196 ± 50.324
2025-05-06 12:38:47,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [351.8071, 410.335, 403.7495, 300.28833, 475.7724, 397.51184, 477.52438, 383.6947, 426.8251, 420.71103]
2025-05-06 12:38:47,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:38:47,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 88/100 (estimated time remaining: 44 minutes, 22 seconds)
2025-05-06 12:41:47,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:42:12,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 371.75061 ± 53.879
2025-05-06 12:42:12,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [422.6278, 335.5614, 296.2371, 341.42233, 479.6333, 385.78357, 348.4293, 363.66855, 317.00623, 427.13647]
2025-05-06 12:42:12,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:42:12,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 89/100 (estimated time remaining: 40 minutes, 58 seconds)
2025-05-06 12:45:13,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:45:38,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 447.70932 ± 50.280
2025-05-06 12:45:38,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [403.5402, 414.5477, 496.04245, 438.4842, 522.7768, 543.0917, 418.48444, 400.40906, 434.58267, 405.134]
2025-05-06 12:45:38,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:45:38,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (447.71) for latency SparseU15
2025-05-06 12:45:38,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:45:38,706 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:45:38,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 90/100 (estimated time remaining: 37 minutes, 36 seconds)
2025-05-06 12:48:38,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:49:03,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 448.95743 ± 38.825
2025-05-06 12:49:03,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [476.74298, 438.17825, 397.6324, 472.90955, 539.8859, 440.32675, 403.1849, 431.9701, 434.14206, 454.6013]
2025-05-06 12:49:03,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:49:03,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (448.96) for latency SparseU15
2025-05-06 12:49:03,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:49:03,824 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:49:03,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 91/100 (estimated time remaining: 34 minutes, 12 seconds)
2025-05-06 12:52:03,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:52:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 475.58676 ± 53.418
2025-05-06 12:52:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [359.38025, 455.73605, 473.22324, 532.10223, 454.60736, 473.80933, 557.7693, 491.46527, 434.35602, 523.4185]
2025-05-06 12:52:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:52:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (475.59) for latency SparseU15
2025-05-06 12:52:28,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:52:28,935 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:52:28,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 47 seconds)
2025-05-06 12:55:28,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:55:53,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 547.96332 ± 43.260
2025-05-06 12:55:53,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [574.69305, 544.7514, 598.76105, 465.64133, 558.8307, 511.5156, 620.3602, 560.3091, 537.93896, 506.83197]
2025-05-06 12:55:53,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:55:53,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (547.96) for latency SparseU15
2025-05-06 12:55:53,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:55:53,818 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 12:55:53,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 93/100 (estimated time remaining: 27 minutes, 21 seconds)
2025-05-06 12:58:52,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 12:59:18,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 497.17731 ± 47.717
2025-05-06 12:59:18,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [449.196, 503.68005, 610.3423, 462.27948, 511.23355, 450.40353, 448.91223, 524.97723, 485.64066, 525.1079]
2025-05-06 12:59:18,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:59:18,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 55 seconds)
2025-05-06 13:02:17,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:02:42,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 557.36719 ± 65.152
2025-05-06 13:02:42,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [433.13626, 538.5583, 521.6076, 594.582, 550.2914, 687.5562, 525.461, 617.75604, 518.5747, 586.1482]
2025-05-06 13:02:42,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:02:42,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (557.37) for latency SparseU15
2025-05-06 13:02:42,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 13:02:42,833 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:02:42,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 28 seconds)
2025-05-06 13:05:42,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:06:07,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 504.02969 ± 50.492
2025-05-06 13:06:07,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [513.9935, 491.6451, 621.05176, 494.19397, 461.78574, 533.6881, 501.83508, 472.34433, 530.01, 419.74915]
2025-05-06 13:06:07,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:06:07,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 96/100 (estimated time remaining: 17 minutes, 3 seconds)
2025-05-06 13:09:06,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:09:32,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 454.59000 ± 38.205
2025-05-06 13:09:32,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [539.9321, 453.81128, 503.38266, 415.63223, 425.52087, 445.36172, 456.67935, 464.7462, 415.90833, 424.92514]
2025-05-06 13:09:32,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:09:32,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 38 seconds)
2025-05-06 13:12:31,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:12:56,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 536.21906 ± 47.547
2025-05-06 13:12:56,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [535.8313, 518.61, 628.78076, 523.3572, 574.5665, 508.7894, 511.81226, 594.4508, 453.93665, 512.0558]
2025-05-06 13:12:56,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:12:56,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 13 seconds)
2025-05-06 13:15:55,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:16:21,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 482.18204 ± 47.100
2025-05-06 13:16:21,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [479.90366, 523.96783, 520.67554, 464.59262, 464.47812, 479.78583, 503.29654, 566.9589, 410.163, 407.998]
2025-05-06 13:16:21,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:16:21,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 49 seconds)
2025-05-06 13:19:20,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:19:45,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 510.57251 ± 59.238
2025-05-06 13:19:45,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [489.0189, 486.55582, 523.5942, 433.40356, 474.64575, 460.68967, 565.9693, 593.6663, 618.2929, 459.88873]
2025-05-06 13:19:45,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:19:45,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 24 seconds)
2025-05-06 13:22:45,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:23:10,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 591.32361 ± 36.395
2025-05-06 13:23:10,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [585.294, 601.96136, 580.2724, 646.55334, 622.0674, 537.3652, 576.2357, 526.34705, 610.41595, 626.7232]
2025-05-06 13:23:10,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:23:10,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (591.32) for latency SparseU15
2025-05-06 13:23:10,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 13:23:10,290 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-halfcheetah/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:23:10,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1149 [DEBUG]: Training session finished
