2025-05-07 22:09:25,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac
2025-05-07 22:09:25,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac
2025-05-07 22:09:25,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7b512d3c2f10>}
2025-05-07 22:09:25,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1009 [DEBUG]: using device: cpu
2025-05-07 22:09:25,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1031 [INFO]: Creating new trainer
2025-05-07 22:09:25,302 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-07 22:09:25,302 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 22:09:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1092 [DEBUG]: Starting training session...
2025-05-07 22:09:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 1/100
2025-05-07 22:11:53,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:12:04,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -374.21637 ± 62.442
2025-05-07 22:12:04,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-407.63214, -478.97302, -377.70023, -320.64218, -382.46432, -284.17972, -420.36386, -299.4953, -322.57983, -448.13333]
2025-05-07 22:12:04,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:12:04,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-374.22) for latency ExtremeSparseL4U32
2025-05-07 22:12:04,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 22:12:04,996 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 22:12:05,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 23 minutes, 15 seconds)
2025-05-07 22:14:42,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:14:54,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -230.52315 ± 136.826
2025-05-07 22:14:54,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-299.8258, -176.46193, -271.46838, -325.0497, -265.23865, -8.49214, -153.9605, -535.8156, -113.82451, -155.09422]
2025-05-07 22:14:54,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:14:54,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-230.52) for latency ExtremeSparseL4U32
2025-05-07 22:14:54,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 22:14:54,093 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 22:14:54,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 28 minutes, 23 seconds)
2025-05-07 22:17:31,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:17:43,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -282.94800 ± 96.483
2025-05-07 22:17:43,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-283.52316, -248.2886, -396.9376, -270.30267, -178.52678, -200.49309, -510.13312, -302.03113, -196.87384, -242.37015]
2025-05-07 22:17:43,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:17:43,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 28 minutes, 9 seconds)
2025-05-07 22:20:20,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:20:32,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -319.51813 ± 51.690
2025-05-07 22:20:32,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-302.19476, -286.23175, -337.42862, -341.17502, -350.67953, -372.0313, -274.57217, -391.7284, -204.11838, -335.02148]
2025-05-07 22:20:32,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:20:32,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 26 minutes, 39 seconds)
2025-05-07 22:23:09,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:23:21,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -221.44080 ± 68.655
2025-05-07 22:23:21,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-189.27635, -214.22147, -311.16165, -297.5765, -274.6723, -93.790344, -199.64685, -164.51753, -167.67767, -301.86728]
2025-05-07 22:23:21,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:23:21,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-221.44) for latency ExtremeSparseL4U32
2025-05-07 22:23:21,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 22:23:21,405 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 22:23:21,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 24 minutes, 43 seconds)
2025-05-07 22:25:58,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:26:10,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -230.26973 ± 43.987
2025-05-07 22:26:10,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-145.85704, -215.07056, -264.31662, -209.66315, -216.88431, -252.83424, -263.79718, -205.27596, -211.45248, -317.5457]
2025-05-07 22:26:10,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:26:10,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 24 minutes, 51 seconds)
2025-05-07 22:28:47,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:28:59,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -305.55191 ± 42.155
2025-05-07 22:28:59,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-246.08727, -311.23465, -377.81927, -313.50162, -337.77536, -288.88748, -222.29669, -316.46805, -327.3998, -314.04895]
2025-05-07 22:28:59,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:28:59,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 21 minutes, 58 seconds)
2025-05-07 22:31:38,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:31:49,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -227.04562 ± 58.317
2025-05-07 22:31:49,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-137.3575, -210.50056, -203.23837, -181.13458, -172.57352, -258.82385, -298.2332, -268.13794, -204.83029, -335.62646]
2025-05-07 22:31:49,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:31:49,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 19 minutes, 37 seconds)
2025-05-07 22:34:28,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:34:39,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -275.76215 ± 90.980
2025-05-07 22:34:39,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-203.41393, -299.85904, -232.51228, -201.58986, -204.2435, -333.3205, -407.8514, -140.66286, -300.8281, -433.34003]
2025-05-07 22:34:39,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:34:39,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 17 minutes, 10 seconds)
2025-05-07 22:37:18,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:37:29,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -281.08221 ± 83.613
2025-05-07 22:37:29,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-203.26321, -195.78127, -247.48817, -197.37978, -427.1348, -181.39073, -334.47815, -342.53455, -382.172, -299.1998]
2025-05-07 22:37:29,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:37:29,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 14 minutes, 34 seconds)
2025-05-07 22:40:08,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:40:20,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -221.59799 ± 92.196
2025-05-07 22:40:20,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-60.644997, -222.60728, -337.7802, -225.54512, -303.51077, -138.85298, -79.139885, -301.63412, -275.66315, -270.6013]
2025-05-07 22:40:20,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:40:20,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 12 minutes, 10 seconds)
2025-05-07 22:42:58,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:43:10,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -339.28888 ± 104.523
2025-05-07 22:43:10,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-265.49664, -354.1086, -336.80838, -279.77304, -437.02176, -311.11716, -559.74695, -238.75935, -424.68713, -185.3695]
2025-05-07 22:43:10,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:43:10,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 9 minutes, 38 seconds)
2025-05-07 22:45:48,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:46:00,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -291.42773 ± 47.370
2025-05-07 22:46:00,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-286.98602, -303.01346, -245.16238, -349.79684, -356.2704, -302.88568, -326.74692, -226.1831, -307.3237, -209.90857]
2025-05-07 22:46:00,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:46:00,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 6 minutes, 36 seconds)
2025-05-07 22:48:38,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:48:50,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -264.84705 ± 42.447
2025-05-07 22:48:50,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-273.8977, -221.69786, -305.4651, -261.3739, -323.2727, -298.4154, -294.1691, -193.67766, -202.67387, -273.82724]
2025-05-07 22:48:50,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:48:50,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 3 minutes, 42 seconds)
2025-05-07 22:51:28,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:51:39,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -286.03690 ± 30.920
2025-05-07 22:51:39,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-310.0947, -283.35358, -331.7865, -254.68349, -252.79898, -262.7022, -243.4088, -281.24158, -326.52747, -313.77182]
2025-05-07 22:51:39,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:51:39,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 45 seconds)
2025-05-07 22:54:18,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:54:29,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -286.93268 ± 51.190
2025-05-07 22:54:29,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-391.11545, -275.50757, -289.5743, -311.6646, -263.85645, -210.69983, -274.34167, -357.49765, -244.83218, -250.2371]
2025-05-07 22:54:29,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:54:29,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 57 minutes, 45 seconds)
2025-05-07 22:57:07,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 22:57:19,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -281.59442 ± 49.739
2025-05-07 22:57:19,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-285.67712, -310.91565, -284.07193, -219.18204, -253.85765, -357.57812, -240.92542, -272.3247, -370.80746, -220.6042]
2025-05-07 22:57:19,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 22:57:19,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 54 minutes, 54 seconds)
2025-05-07 22:59:57,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:00:08,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -351.99310 ± 63.926
2025-05-07 23:00:08,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-345.22086, -416.76205, -440.65604, -359.81506, -350.2663, -338.35767, -377.21396, -401.7203, -280.25864, -209.6604]
2025-05-07 23:00:08,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:00:08,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 51 minutes, 58 seconds)
2025-05-07 23:02:47,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:02:58,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -158.77859 ± 65.859
2025-05-07 23:02:58,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-128.62682, -48.476055, -150.97432, -174.32314, -158.40659, -214.87898, -79.66886, -197.79639, -295.46457, -139.17014]
2025-05-07 23:02:58,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:02:58,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1124 [INFO]: New best (-158.78) for latency ExtremeSparseL4U32
2025-05-07 23:02:58,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 23:02:58,562 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 23:02:58,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 49 minutes, 5 seconds)
2025-05-07 23:05:37,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:05:49,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -341.71967 ± 96.587
2025-05-07 23:05:49,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-376.20386, -459.0985, -311.2929, -385.234, -507.09464, -299.5422, -253.55453, -395.60263, -256.48105, -173.0926]
2025-05-07 23:05:49,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:05:49,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 46 minutes, 29 seconds)
2025-05-07 23:08:27,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:08:39,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -304.72818 ± 98.120
2025-05-07 23:08:39,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-290.50827, -257.61096, -177.6755, -266.45776, -405.54178, -200.33888, -390.47876, -221.76561, -505.29565, -331.60873]
2025-05-07 23:08:39,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:08:39,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 43 minutes, 45 seconds)
2025-05-07 23:11:17,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:11:29,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -333.21179 ± 101.074
2025-05-07 23:11:29,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-253.51733, -388.8891, -322.1995, -146.40364, -449.1893, -264.7989, -433.42682, -319.74774, -489.90042, -264.04514]
2025-05-07 23:11:29,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:11:29,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 41 minutes)
2025-05-07 23:14:08,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:14:20,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -386.87360 ± 96.076
2025-05-07 23:14:20,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-281.8502, -452.18924, -341.5703, -497.2285, -258.17798, -503.50574, -281.16574, -524.6913, -337.56625, -390.791]
2025-05-07 23:14:20,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:14:20,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 38 minutes, 29 seconds)
2025-05-07 23:16:58,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:17:10,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -325.50024 ± 77.063
2025-05-07 23:17:10,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-264.08405, -351.2547, -301.09213, -408.88797, -319.21854, -310.4603, -457.0878, -260.62326, -398.97116, -183.32263]
2025-05-07 23:17:10,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:17:10,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 35 minutes, 44 seconds)
2025-05-07 23:19:48,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:20:00,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -362.82913 ± 77.931
2025-05-07 23:20:00,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-313.48438, -388.45453, -412.7924, -267.55978, -446.64352, -423.59616, -259.63495, -260.25058, -481.3253, -374.54974]
2025-05-07 23:20:00,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:20:00,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 32 minutes, 46 seconds)
2025-05-07 23:22:38,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:22:50,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -344.28427 ± 57.415
2025-05-07 23:22:50,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-406.613, -369.4643, -362.57526, -294.87894, -243.25601, -372.42514, -410.78864, -248.61589, -361.8948, -372.33072]
2025-05-07 23:22:50,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:22:50,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 29 minutes, 56 seconds)
2025-05-07 23:25:28,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:25:40,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -360.75821 ± 83.333
2025-05-07 23:25:40,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-327.5233, -334.45105, -359.8058, -573.7171, -374.19122, -254.13264, -419.9746, -357.11, -285.56006, -321.11624]
2025-05-07 23:25:40,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:25:40,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 26 minutes, 59 seconds)
2025-05-07 23:28:18,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:28:30,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -326.09949 ± 76.999
2025-05-07 23:28:30,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-381.87604, -252.23041, -281.2791, -374.5639, -299.03604, -348.6248, -402.56583, -181.93922, -455.2786, -283.601]
2025-05-07 23:28:30,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:28:30,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 24 minutes, 3 seconds)
2025-05-07 23:31:09,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:31:20,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -316.57840 ± 97.035
2025-05-07 23:31:20,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-274.8477, -316.30573, -349.37518, -369.48358, -281.64478, -199.72444, -531.52814, -161.74945, -374.76736, -306.35767]
2025-05-07 23:31:20,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:31:20,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 21 minutes, 13 seconds)
2025-05-07 23:33:58,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:34:10,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -287.79892 ± 59.400
2025-05-07 23:34:10,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-349.42422, -234.32912, -297.2416, -402.00513, -230.806, -345.01498, -198.11015, -264.24994, -288.33945, -268.46884]
2025-05-07 23:34:10,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:34:10,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 18 minutes, 21 seconds)
2025-05-07 23:36:49,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:37:00,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -262.79431 ± 60.270
2025-05-07 23:37:00,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-322.7327, -256.65866, -282.48514, -311.3071, -229.05992, -154.75752, -306.3122, -343.9688, -250.05183, -170.60934]
2025-05-07 23:37:00,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:37:00,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 15 minutes, 31 seconds)
2025-05-07 23:39:39,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:39:50,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -380.43985 ± 96.494
2025-05-07 23:39:50,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-381.69653, -295.44897, -583.21216, -373.68964, -369.55148, -184.63956, -375.30887, -436.71722, -434.76105, -369.37277]
2025-05-07 23:39:50,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:39:50,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 12 minutes, 44 seconds)
2025-05-07 23:42:29,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:42:40,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -308.28879 ± 81.456
2025-05-07 23:42:40,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-441.03415, -232.25513, -472.3412, -262.59833, -307.6417, -277.144, -238.01997, -298.55887, -331.1529, -222.1418]
2025-05-07 23:42:40,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:42:40,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 9 minutes, 52 seconds)
2025-05-07 23:45:19,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:45:30,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -282.92914 ± 24.114
2025-05-07 23:45:30,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-282.53195, -260.3168, -294.94128, -228.43906, -326.80106, -277.91034, -290.30875, -286.93594, -295.71927, -285.38696]
2025-05-07 23:45:30,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:45:30,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 7 minutes)
2025-05-07 23:48:07,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:48:18,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -252.13802 ± 91.755
2025-05-07 23:48:18,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-174.5384, -423.07675, -417.0408, -311.10654, -224.2587, -190.43301, -186.0422, -211.6059, -205.08813, -178.18971]
2025-05-07 23:48:18,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:48:18,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 3 minutes, 49 seconds)
2025-05-07 23:50:55,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:51:06,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -248.13203 ± 47.737
2025-05-07 23:51:06,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-224.39377, -197.8383, -309.61218, -244.23094, -323.84305, -193.94391, -264.71628, -304.42474, -228.52438, -189.7927]
2025-05-07 23:51:06,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:51:06,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 26 seconds)
2025-05-07 23:53:41,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:53:53,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -315.14658 ± 80.292
2025-05-07 23:53:53,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-252.49478, -277.76074, -281.74112, -204.25119, -263.08014, -266.02933, -467.26773, -390.85413, -324.95163, -423.03522]
2025-05-07 23:53:53,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:53:53,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 56 minutes, 59 seconds)
2025-05-07 23:56:28,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:56:39,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -283.85312 ± 70.399
2025-05-07 23:56:39,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-321.34314, -233.39993, -334.25656, -260.5984, -176.11421, -266.8404, -259.33664, -209.4129, -352.98828, -424.24066]
2025-05-07 23:56:39,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:56:39,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 53 minutes, 22 seconds)
2025-05-07 23:59:13,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 23:59:24,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -315.49716 ± 35.022
2025-05-07 23:59:24,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-377.5287, -330.3875, -289.47244, -274.99197, -312.6253, -360.42938, -294.4368, -336.0709, -260.93292, -318.09607]
2025-05-07 23:59:24,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 23:59:24,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 49 minutes, 33 seconds)
2025-05-08 00:01:57,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:02:08,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -245.28329 ± 73.796
2025-05-08 00:02:08,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-382.39984, -133.5357, -301.65552, -314.81848, -288.34143, -161.02933, -212.94102, -197.35918, -195.61354, -265.13904]
2025-05-08 00:02:08,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:02:08,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 46 minutes, 2 seconds)
2025-05-08 00:04:41,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:04:53,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -254.45601 ± 63.705
2025-05-08 00:04:53,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-264.94775, -309.43442, -102.96262, -212.74393, -288.38852, -316.04285, -301.70517, -262.41644, -294.2222, -191.69629]
2025-05-08 00:04:53,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:04:53,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 42 minutes, 37 seconds)
2025-05-08 00:07:25,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:07:37,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -298.92767 ± 47.028
2025-05-08 00:07:37,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-337.1457, -299.88937, -269.22226, -403.09317, -330.94263, -227.38283, -274.0714, -269.47357, -312.44424, -265.61157]
2025-05-08 00:07:37,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:07:37,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 39 minutes, 20 seconds)
2025-05-08 00:10:10,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:10:21,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -317.17932 ± 88.231
2025-05-08 00:10:21,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-424.13055, -236.71915, -209.91443, -363.99286, -239.62125, -261.24966, -282.53186, -266.62173, -416.85297, -470.1587]
2025-05-08 00:10:21,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:10:21,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 36 minutes, 11 seconds)
2025-05-08 00:12:54,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:13:05,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -334.48990 ± 69.646
2025-05-08 00:13:05,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-449.4681, -338.36594, -228.17706, -275.0801, -446.42728, -350.77548, -307.49783, -319.83362, -260.2687, -369.0046]
2025-05-08 00:13:05,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:13:05,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 33 minutes, 20 seconds)
2025-05-08 00:15:38,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:15:50,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -336.49359 ± 70.581
2025-05-08 00:15:50,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-475.1737, -267.17123, -297.19046, -422.92038, -278.43716, -360.2474, -316.62418, -399.01465, -293.24005, -254.91644]
2025-05-08 00:15:50,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:15:50,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 30 minutes, 34 seconds)
2025-05-08 00:18:22,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:18:34,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -345.89575 ± 59.119
2025-05-08 00:18:34,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-416.6599, -347.1545, -248.26941, -364.44888, -358.73666, -260.98425, -373.3478, -447.9591, -331.89105, -309.5059]
2025-05-08 00:18:34,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:18:34,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 27 minutes, 46 seconds)
2025-05-08 00:21:06,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:21:17,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -353.12836 ± 133.439
2025-05-08 00:21:17,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-428.61282, -234.62483, -167.38943, -431.41486, -312.42612, -298.88498, -608.8985, -360.84637, -498.12314, -190.06274]
2025-05-08 00:21:17,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:21:17,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 24 minutes, 57 seconds)
2025-05-08 00:23:50,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:24:02,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -306.74176 ± 85.294
2025-05-08 00:24:02,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-317.5638, -251.08278, -333.31882, -201.85414, -289.36288, -325.97278, -413.00296, -215.9941, -233.07425, -486.19104]
2025-05-08 00:24:02,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:24:02,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 22 minutes, 13 seconds)
2025-05-08 00:26:34,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:26:46,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -396.50021 ± 107.001
2025-05-08 00:26:46,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-331.99518, -450.73416, -416.9851, -458.16757, -423.14923, -214.49707, -623.06866, -326.21307, -287.95615, -432.23618]
2025-05-08 00:26:46,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:26:46,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 19 minutes, 25 seconds)
2025-05-08 00:29:18,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:29:29,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -187.55930 ± 44.389
2025-05-08 00:29:29,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-160.94165, -184.48888, -241.5545, -165.77736, -255.93068, -155.1714, -231.54938, -215.56686, -155.18227, -109.430115]
2025-05-08 00:29:29,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:29:29,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 16 minutes, 34 seconds)
2025-05-08 00:32:02,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:32:13,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -319.32040 ± 51.124
2025-05-08 00:32:13,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-363.2839, -276.57858, -416.71524, -341.06058, -225.75642, -302.57663, -337.6882, -306.60403, -274.31024, -348.63022]
2025-05-08 00:32:13,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:32:13,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 13 minutes, 50 seconds)
2025-05-08 00:34:58,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:35:09,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -292.72614 ± 61.358
2025-05-08 00:35:09,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-436.7677, -254.52548, -222.8716, -225.57005, -293.77216, -309.136, -344.8057, -242.45915, -314.01895, -283.33463]
2025-05-08 00:35:09,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:35:09,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 13 minutes, 1 second)
2025-05-08 00:37:54,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:38:05,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -246.78613 ± 42.736
2025-05-08 00:38:05,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-266.4942, -239.82024, -214.89505, -148.87956, -274.2361, -235.47244, -270.92322, -299.97015, -295.00623, -222.16402]
2025-05-08 00:38:05,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:38:05,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 12 minutes, 8 seconds)
2025-05-08 00:40:50,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:41:01,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -285.66483 ± 133.151
2025-05-08 00:41:01,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-344.97012, -182.71396, -290.5251, -357.01715, -283.7006, -602.4909, -136.168, -181.46713, -338.99255, -138.60274]
2025-05-08 00:41:01,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:41:01,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 11 minutes, 12 seconds)
2025-05-08 00:43:46,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:43:57,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -328.18799 ± 69.714
2025-05-08 00:43:57,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-259.31042, -273.8057, -413.12363, -447.58215, -396.15958, -316.5607, -274.73254, -378.52817, -249.52098, -272.55615]
2025-05-08 00:43:57,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:43:57,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 10 minutes, 11 seconds)
2025-05-08 00:46:42,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:46:53,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -233.20642 ± 81.371
2025-05-08 00:46:53,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-349.1512, -145.23369, -301.49023, -138.94429, -119.51001, -199.22151, -268.26852, -296.9642, -182.4542, -330.82626]
2025-05-08 00:46:53,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:46:53,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 9 minutes, 2 seconds)
2025-05-08 00:49:38,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:49:49,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -300.11371 ± 73.897
2025-05-08 00:49:49,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-284.10703, -239.15175, -176.61743, -300.77664, -446.65176, -275.7907, -404.65054, -320.18405, -297.60913, -255.59827]
2025-05-08 00:49:49,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:49:49,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 6 minutes, 10 seconds)
2025-05-08 00:52:33,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:52:45,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -295.79691 ± 92.182
2025-05-08 00:52:45,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-256.48563, -389.95215, -270.37308, -163.60872, -221.57056, -199.79419, -316.4298, -271.8102, -469.56085, -398.38382]
2025-05-08 00:52:45,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:52:45,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 3 minutes, 8 seconds)
2025-05-08 00:55:29,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:55:40,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -287.10321 ± 69.244
2025-05-08 00:55:40,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-226.03752, -375.34595, -273.83893, -197.74234, -243.3544, -354.47656, -317.42236, -373.99173, -329.28146, -179.5409]
2025-05-08 00:55:40,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:55:40,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 9 seconds)
2025-05-08 00:58:25,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 00:58:36,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -161.67529 ± 103.920
2025-05-08 00:58:36,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-186.05777, -113.35278, -170.39021, -66.756, -90.53343, -333.7243, -16.511417, -336.6328, -220.03598, -82.75823]
2025-05-08 00:58:36,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 00:58:36,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 57 minutes, 11 seconds)
2025-05-08 01:01:20,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:01:32,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -270.44559 ± 33.125
2025-05-08 01:01:32,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-266.03757, -257.65167, -286.90216, -208.20494, -245.04678, -330.97192, -285.26575, -238.21126, -298.20813, -287.95584]
2025-05-08 01:01:32,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:01:32,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 54 minutes, 13 seconds)
2025-05-08 01:04:16,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:04:27,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -255.68015 ± 72.288
2025-05-08 01:04:27,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-318.83194, -135.6127, -411.35547, -231.42006, -212.40892, -277.064, -225.49152, -187.30534, -273.35623, -283.95523]
2025-05-08 01:04:27,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:04:27,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 51 minutes, 14 seconds)
2025-05-08 01:07:12,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:07:23,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -301.84641 ± 37.694
2025-05-08 01:07:23,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-337.6437, -327.6581, -364.79373, -252.81685, -249.79875, -321.9693, -258.61975, -308.4871, -318.8722, -277.80496]
2025-05-08 01:07:23,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:07:23,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 48 minutes, 20 seconds)
2025-05-08 01:10:08,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:10:19,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -255.12163 ± 56.509
2025-05-08 01:10:19,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-215.70511, -281.4908, -165.5292, -243.5767, -269.1173, -175.34067, -234.69492, -341.8496, -328.74213, -295.1698]
2025-05-08 01:10:19,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:10:19,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 45 minutes, 25 seconds)
2025-05-08 01:13:04,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:13:15,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -291.27954 ± 63.802
2025-05-08 01:13:15,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-318.64267, -282.59613, -238.32147, -380.68054, -363.39746, -238.27951, -236.86533, -199.0302, -268.40668, -386.57532]
2025-05-08 01:13:15,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:13:15,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 42 minutes, 33 seconds)
2025-05-08 01:16:00,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:16:11,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -238.55002 ± 75.535
2025-05-08 01:16:11,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-247.95486, -211.60104, -261.46356, -251.87007, -107.69223, -150.76839, -212.22144, -260.89618, -406.55576, -274.47678]
2025-05-08 01:16:11,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:16:11,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 39 minutes, 42 seconds)
2025-05-08 01:18:56,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:19:07,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -298.22931 ± 47.972
2025-05-08 01:19:07,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-287.7602, -293.1404, -299.18658, -222.89865, -316.95798, -314.59723, -212.02966, -385.20813, -333.2165, -317.2978]
2025-05-08 01:19:07,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:19:07,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 36 minutes, 45 seconds)
2025-05-08 01:21:51,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:22:02,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -257.36328 ± 82.795
2025-05-08 01:22:02,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-164.11415, -233.58835, -421.6585, -191.62868, -242.66917, -295.33844, -282.72855, -136.55107, -361.71667, -243.63918]
2025-05-08 01:22:02,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:22:02,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 33 minutes, 46 seconds)
2025-05-08 01:24:46,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:24:57,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -244.52754 ± 73.555
2025-05-08 01:24:57,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-185.85104, -72.73319, -299.2594, -268.375, -234.84276, -313.19977, -286.27414, -183.89578, -276.90836, -323.93582]
2025-05-08 01:24:57,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:24:57,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 30 minutes, 46 seconds)
2025-05-08 01:27:42,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:27:53,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -264.39847 ± 38.772
2025-05-08 01:27:53,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-259.8234, -287.31223, -235.01143, -292.848, -335.81802, -272.85605, -255.55309, -190.53703, -288.7462, -225.4793]
2025-05-08 01:27:53,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:27:53,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 27 minutes, 46 seconds)
2025-05-08 01:30:46,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:30:58,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -257.67126 ± 36.077
2025-05-08 01:30:58,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-280.96365, -306.5582, -268.86417, -251.8366, -202.49603, -208.43816, -215.70496, -267.68997, -307.86487, -266.29623]
2025-05-08 01:30:58,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:30:58,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 25 minutes, 39 seconds)
2025-05-08 01:33:31,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:33:42,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -286.06665 ± 54.055
2025-05-08 01:33:42,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-264.0824, -217.8764, -281.5547, -262.65164, -409.99258, -316.9275, -268.31757, -232.01424, -262.67468, -344.57483]
2025-05-08 01:33:42,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:33:42,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 21 minutes, 40 seconds)
2025-05-08 01:36:15,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:36:27,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -262.34622 ± 63.842
2025-05-08 01:36:27,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-314.50684, -306.60703, -231.64998, -210.08688, -298.8983, -363.17554, -207.80054, -137.25368, -307.27106, -246.21222]
2025-05-08 01:36:27,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:36:27,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 17 minutes, 49 seconds)
2025-05-08 01:39:01,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:39:12,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -278.38092 ± 50.219
2025-05-08 01:39:12,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-243.4837, -256.0488, -216.52362, -361.13132, -250.85983, -359.4521, -249.11151, -233.7933, -283.78418, -329.62073]
2025-05-08 01:39:12,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:39:12,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 14 minutes, 3 seconds)
2025-05-08 01:41:46,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:41:57,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -283.46918 ± 53.821
2025-05-08 01:41:57,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-295.94156, -229.34456, -291.06165, -243.42107, -223.2504, -255.5238, -361.27582, -267.4325, -398.7262, -268.71432]
2025-05-08 01:41:57,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:41:57,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 10 minutes, 20 seconds)
2025-05-08 01:44:31,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:44:42,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -235.57500 ± 55.062
2025-05-08 01:44:42,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-147.89418, -243.44566, -192.97626, -328.96655, -200.52385, -237.68001, -294.82507, -199.29546, -204.026, -306.11697]
2025-05-08 01:44:42,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:44:42,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 5 minutes, 58 seconds)
2025-05-08 01:47:16,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:47:27,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -247.06889 ± 48.918
2025-05-08 01:47:27,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-220.89719, -349.40054, -220.01666, -254.18169, -265.86874, -209.50175, -256.7503, -268.20502, -152.2904, -273.57663]
2025-05-08 01:47:27,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:47:27,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 3 minutes, 16 seconds)
2025-05-08 01:50:01,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:50:13,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -193.04599 ± 53.561
2025-05-08 01:50:13,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-181.45251, -161.07106, -206.55542, -164.94897, -122.33685, -211.56055, -268.35504, -167.03389, -142.93759, -304.2082]
2025-05-08 01:50:13,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:50:13,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 33 seconds)
2025-05-08 01:52:46,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:52:58,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -319.68365 ± 58.657
2025-05-08 01:52:58,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-357.13776, -319.5746, -277.2947, -241.14291, -319.8303, -326.3301, -242.91386, -457.03748, -334.94174, -320.63303]
2025-05-08 01:52:58,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:52:58,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 80/100 (estimated time remaining: 57 minutes, 47 seconds)
2025-05-08 01:55:32,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:55:43,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -354.24176 ± 82.877
2025-05-08 01:55:43,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-551.58295, -381.01614, -376.5979, -349.9927, -259.94437, -248.40651, -338.5943, -415.70175, -326.54776, -294.0333]
2025-05-08 01:55:43,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:55:43,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 81/100 (estimated time remaining: 55 minutes, 2 seconds)
2025-05-08 01:58:17,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 01:58:28,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -259.55707 ± 36.074
2025-05-08 01:58:28,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-250.12047, -280.94623, -261.7596, -188.92017, -230.66327, -313.77045, -317.6253, -258.13116, -246.755, -246.87947]
2025-05-08 01:58:28,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 01:58:28,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 82/100 (estimated time remaining: 52 minutes, 18 seconds)
2025-05-08 02:01:02,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:01:13,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -262.01730 ± 61.793
2025-05-08 02:01:13,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-278.061, -337.1235, -384.89432, -224.2823, -276.1432, -289.09555, -229.90685, -231.76863, -163.8243, -205.0736]
2025-05-08 02:01:13,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:01:13,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 83/100 (estimated time remaining: 49 minutes, 33 seconds)
2025-05-08 02:03:47,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:03:58,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -283.94485 ± 59.039
2025-05-08 02:03:58,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-246.9152, -328.37512, -391.67188, -238.15038, -304.14478, -307.14655, -196.91624, -352.42352, -227.79993, -245.90508]
2025-05-08 02:03:58,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:03:58,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 84/100 (estimated time remaining: 46 minutes, 46 seconds)
2025-05-08 02:06:31,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:06:43,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -285.61404 ± 54.211
2025-05-08 02:06:43,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-294.6806, -289.16153, -202.52843, -307.5516, -323.70248, -370.36847, -258.5278, -348.66968, -195.23085, -265.71902]
2025-05-08 02:06:43,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:06:43,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 85/100 (estimated time remaining: 43 minutes, 59 seconds)
2025-05-08 02:09:16,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:09:28,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -245.72531 ± 69.151
2025-05-08 02:09:28,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-333.0925, -201.94342, -223.16866, -243.88257, -173.39917, -117.31923, -318.31348, -337.52255, -287.42917, -221.18236]
2025-05-08 02:09:28,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:09:28,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 86/100 (estimated time remaining: 41 minutes, 14 seconds)
2025-05-08 02:12:01,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:12:13,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -371.34372 ± 57.140
2025-05-08 02:12:13,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-320.69843, -325.42554, -418.8694, -515.2309, -366.43466, -333.74994, -321.9765, -362.73337, -351.2473, -397.07123]
2025-05-08 02:12:13,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:12:13,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 87/100 (estimated time remaining: 38 minutes, 28 seconds)
2025-05-08 02:14:46,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:14:57,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -258.20770 ± 71.632
2025-05-08 02:14:57,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-280.86853, -249.41548, -259.08206, -171.62735, -230.63156, -284.2265, -153.59828, -230.77545, -293.88766, -427.96432]
2025-05-08 02:14:57,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:14:57,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 88/100 (estimated time remaining: 35 minutes, 42 seconds)
2025-05-08 02:17:31,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:17:42,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -308.65829 ± 70.487
2025-05-08 02:17:42,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-308.78665, -235.71478, -345.2402, -276.71222, -248.83633, -237.16103, -394.34528, -267.88672, -464.5695, -307.3302]
2025-05-08 02:17:42,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:17:42,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 89/100 (estimated time remaining: 32 minutes, 57 seconds)
2025-05-08 02:20:15,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:20:27,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -417.53070 ± 56.157
2025-05-08 02:20:27,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-379.6168, -369.09778, -408.42825, -412.56277, -344.8099, -487.52908, -413.64706, -519.6155, -360.80203, -479.1979]
2025-05-08 02:20:27,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:20:27,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 90/100 (estimated time remaining: 30 minutes, 12 seconds)
2025-05-08 02:23:00,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:23:11,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -263.53534 ± 51.197
2025-05-08 02:23:11,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-244.07716, -236.38268, -190.23595, -247.48906, -273.43707, -312.42874, -217.67026, -377.80957, -238.11128, -297.7115]
2025-05-08 02:23:11,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:23:11,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 91/100 (estimated time remaining: 27 minutes, 27 seconds)
2025-05-08 02:25:45,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:25:56,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -268.14551 ± 74.009
2025-05-08 02:25:56,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-339.09723, -336.03757, -354.97333, -277.70032, -113.21857, -265.2899, -287.58823, -155.81781, -289.5497, -262.18225]
2025-05-08 02:25:56,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:25:56,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 92/100 (estimated time remaining: 24 minutes, 41 seconds)
2025-05-08 02:28:29,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:28:41,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -266.12869 ± 40.405
2025-05-08 02:28:41,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-227.3887, -241.50145, -253.47528, -220.74078, -316.35324, -355.03394, -252.9254, -240.75365, -292.92123, -260.19336]
2025-05-08 02:28:41,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:28:41,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 93/100 (estimated time remaining: 21 minutes, 57 seconds)
2025-05-08 02:31:15,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:31:27,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -306.60925 ± 36.830
2025-05-08 02:31:27,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-335.01196, -349.4628, -303.93665, -337.1315, -343.3694, -248.73758, -334.33694, -251.44351, -291.1259, -271.53647]
2025-05-08 02:31:27,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:31:27,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 94/100 (estimated time remaining: 19 minutes, 14 seconds)
2025-05-08 02:34:01,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:34:13,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -289.81226 ± 33.577
2025-05-08 02:34:13,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-282.85568, -325.37296, -317.5118, -293.45233, -246.9215, -288.21146, -313.4679, -234.30875, -340.22113, -255.799]
2025-05-08 02:34:13,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:34:13,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 31 seconds)
2025-05-08 02:36:47,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:36:59,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -311.09119 ± 22.199
2025-05-08 02:36:59,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-341.3666, -298.19968, -315.59882, -326.56964, -276.56342, -336.06985, -322.58896, -310.47214, -270.67523, -312.80728]
2025-05-08 02:36:59,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:36:59,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 47 seconds)
2025-05-08 02:39:33,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:39:45,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -313.60425 ± 56.668
2025-05-08 02:39:45,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-314.39905, -397.62103, -192.49356, -327.61014, -259.18445, -343.7927, -382.60748, -335.39056, -298.54514, -284.39825]
2025-05-08 02:39:45,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:39:45,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 3 seconds)
2025-05-08 02:42:19,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:42:31,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -301.72699 ± 29.848
2025-05-08 02:42:31,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-301.9692, -335.24295, -353.43433, -287.64188, -274.70447, -256.4697, -265.0399, -309.02948, -305.0904, -328.64804]
2025-05-08 02:42:31,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:42:31,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 17 seconds)
2025-05-08 02:45:05,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:45:16,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -266.37244 ± 58.410
2025-05-08 02:45:16,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-262.0056, -289.74872, -103.68693, -302.1715, -260.5172, -252.12196, -286.80838, -301.62622, -327.946, -277.09183]
2025-05-08 02:45:16,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:45:16,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 31 seconds)
2025-05-08 02:47:51,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:48:03,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -370.90787 ± 38.463
2025-05-08 02:48:03,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-387.02243, -369.1125, -408.95023, -376.59662, -404.79056, -333.6452, -280.60852, -357.27197, -373.61002, -417.47043]
2025-05-08 02:48:03,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:48:03,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 45 seconds)
2025-05-08 02:50:37,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:50:49,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -254.50244 ± 65.570
2025-05-08 02:50:49,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-218.9953, -239.31271, -130.98659, -215.81131, -335.78888, -348.5192, -227.43648, -207.86813, -300.10107, -320.2046]
2025-05-08 02:50:49,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-08 02:50:49,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1149 [DEBUG]: Training session finished
