2025-05-09 13:34:55,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac
2025-05-09 13:34:55,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac
2025-05-09 13:34:55,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x79a1e583cf70>}
2025-05-09 13:34:55,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1111 [DEBUG]: using device: cpu
2025-05-09 13:34:55,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-09 13:34:55,975 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 13:34:55,975 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 13:34:56,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-09 13:34:56,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-09 13:37:20,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:37:33,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -362.79916 ± 52.916
2025-05-09 13:37:33,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-363.17358, -311.0794, -330.76227, -353.26816, -385.09332, -326.05545, -323.37686, -330.71515, -408.75433, -495.71295]
2025-05-09 13:37:33,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:37:33,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-362.80) for latency MM1Queue_a033_s075
2025-05-09 13:37:33,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 13:37:33,348 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:37:33,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 18 minutes, 22 seconds)
2025-05-09 13:40:07,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:40:20,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -226.88843 ± 67.807
2025-05-09 13:40:20,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-215.21603, -144.84737, -173.81233, -371.4477, -320.20184, -186.19322, -181.0469, -267.82275, -193.63174, -214.66435]
2025-05-09 13:40:20,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:40:20,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-226.89) for latency MM1Queue_a033_s075
2025-05-09 13:40:20,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 13:40:20,090 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:40:20,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 24 minutes, 3 seconds)
2025-05-09 13:42:54,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:43:07,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -244.65279 ± 79.774
2025-05-09 13:43:07,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-234.79805, -278.7324, -328.1988, -198.60161, -352.42, -48.107906, -292.53183, -217.93108, -250.04471, -245.16145]
2025-05-09 13:43:07,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:43:07,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 24 minutes, 33 seconds)
2025-05-09 13:45:40,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:45:53,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -230.55486 ± 53.160
2025-05-09 13:45:53,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-265.49005, -216.88008, -202.20529, -237.36038, -350.55933, -254.03296, -134.08601, -191.7146, -216.5091, -236.71083]
2025-05-09 13:45:53,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:45:53,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 22 minutes, 37 seconds)
2025-05-09 13:48:27,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:48:40,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -188.50282 ± 47.792
2025-05-09 13:48:40,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-96.93364, -239.04771, -228.25912, -106.63558, -199.10602, -206.40334, -224.394, -217.99121, -205.1508, -161.10684]
2025-05-09 13:48:40,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:48:40,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-188.50) for latency MM1Queue_a033_s075
2025-05-09 13:48:40,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 13:48:40,136 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:48:40,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 20 minutes, 44 seconds)
2025-05-09 13:51:12,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:51:25,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -189.76384 ± 50.691
2025-05-09 13:51:25,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-166.05647, -226.96895, -228.47, -171.99525, -179.30429, -229.5804, -75.78378, -146.1198, -258.39047, -214.96881]
2025-05-09 13:51:25,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:51:25,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 20 minutes, 37 seconds)
2025-05-09 13:53:57,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:54:10,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -186.06705 ± 34.214
2025-05-09 13:54:10,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-207.62585, -184.89244, -132.29794, -221.49118, -138.11598, -159.04796, -231.5518, -207.628, -161.50397, -216.51546]
2025-05-09 13:54:10,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:54:10,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-186.07) for latency MM1Queue_a033_s075
2025-05-09 13:54:10,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 13:54:10,544 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:54:10,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 17 minutes, 26 seconds)
2025-05-09 13:56:45,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:56:58,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -163.76772 ± 69.297
2025-05-09 13:56:58,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-218.07724, -210.50844, -68.37146, -219.44377, -192.17757, -200.5208, -228.49394, -88.84878, -28.27585, -182.9592]
2025-05-09 13:56:58,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:56:58,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-163.77) for latency MM1Queue_a033_s075
2025-05-09 13:56:58,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 13:56:58,012 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:56:58,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 14 minutes, 37 seconds)
2025-05-09 13:59:31,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:59:44,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -156.36319 ± 48.506
2025-05-09 13:59:44,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-181.73386, -197.36656, -143.2935, -153.3224, -150.35745, -164.52205, -164.81984, -207.67136, -23.059057, -177.48566]
2025-05-09 13:59:44,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:59:44,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-156.36) for latency MM1Queue_a033_s075
2025-05-09 13:59:44,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 13:59:44,505 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:59:44,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 12 minutes, 7 seconds)
2025-05-09 14:02:18,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:02:31,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -148.69370 ± 51.297
2025-05-09 14:02:31,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-86.7322, -138.15936, -174.4253, -196.92168, -155.3099, -200.44324, -200.4112, -109.44456, -42.45351, -182.63602]
2025-05-09 14:02:31,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:02:31,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-148.69) for latency MM1Queue_a033_s075
2025-05-09 14:02:31,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:02:31,373 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:02:31,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 9 minutes, 22 seconds)
2025-05-09 14:05:06,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:05:19,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -158.05948 ± 36.130
2025-05-09 14:05:19,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-188.45235, -186.72041, -166.96329, -180.78918, -166.54622, -101.563194, -132.26854, -156.7453, -207.98201, -92.56426]
2025-05-09 14:05:19,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:05:19,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 7 minutes, 27 seconds)
2025-05-09 14:07:52,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:08:05,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -161.09595 ± 43.536
2025-05-09 14:08:05,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-167.19476, -174.11565, -125.92492, -136.24385, -141.8087, -145.7712, -132.97423, -278.4984, -182.64386, -125.78389]
2025-05-09 14:08:05,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:08:05,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 4 minutes, 48 seconds)
2025-05-09 14:10:40,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:10:52,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -130.49423 ± 26.456
2025-05-09 14:10:52,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-129.94884, -161.97134, -128.30037, -109.50069, -141.1094, -185.81946, -127.49846, -86.44282, -110.664215, -123.686844]
2025-05-09 14:10:52,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:10:52,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-130.49) for latency MM1Queue_a033_s075
2025-05-09 14:10:52,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:10:52,903 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:10:52,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 2 minutes, 7 seconds)
2025-05-09 14:13:27,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:13:40,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -131.45166 ± 54.973
2025-05-09 14:13:40,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-194.87355, -118.71837, -139.93573, -126.83297, -208.1955, -142.86113, -8.339467, -94.09755, -180.4128, -100.24951]
2025-05-09 14:13:40,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:13:40,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 59 minutes, 33 seconds)
2025-05-09 14:16:14,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:16:27,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -143.41757 ± 27.017
2025-05-09 14:16:27,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-164.56007, -158.94678, -133.4714, -158.3647, -139.56946, -75.1147, -124.255516, -156.91568, -147.23555, -175.74193]
2025-05-09 14:16:27,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:16:27,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 56 minutes, 45 seconds)
2025-05-09 14:19:02,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:19:14,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -153.99460 ± 39.737
2025-05-09 14:19:14,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-128.18098, -160.3824, -121.12606, -149.9418, -217.56831, -105.09294, -146.32492, -218.07164, -106.131424, -187.12566]
2025-05-09 14:19:14,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:19:14,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 54 minutes)
2025-05-09 14:21:48,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:22:01,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -144.17255 ± 44.998
2025-05-09 14:22:01,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-47.62265, -132.97124, -178.35335, -160.73486, -120.14374, -133.952, -158.79886, -231.9764, -157.16339, -120.00913]
2025-05-09 14:22:01,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:22:01,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 51 minutes, 15 seconds)
2025-05-09 14:24:35,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:24:47,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -147.12213 ± 26.659
2025-05-09 14:24:47,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-206.35909, -139.7168, -161.1744, -108.89211, -121.09719, -131.8368, -124.74597, -164.2552, -156.24141, -156.90237]
2025-05-09 14:24:47,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:24:47,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 48 minutes, 15 seconds)
2025-05-09 14:27:23,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:27:35,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -136.52487 ± 59.657
2025-05-09 14:27:35,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-283.20636, -100.81851, -158.90796, -103.959526, -145.4421, -106.08979, -189.11198, -60.99952, -117.73651, -98.97654]
2025-05-09 14:27:35,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:27:35,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 45 minutes, 36 seconds)
2025-05-09 14:30:09,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:30:22,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -106.96338 ± 25.998
2025-05-09 14:30:22,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-119.81377, -113.3149, -120.61704, -71.94084, -107.98909, -98.17928, -159.20255, -94.58651, -62.56759, -121.42221]
2025-05-09 14:30:22,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:30:22,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-106.96) for latency MM1Queue_a033_s075
2025-05-09 14:30:22,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:30:22,720 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:30:22,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 42 minutes, 51 seconds)
2025-05-09 14:32:56,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:33:09,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -115.53298 ± 47.676
2025-05-09 14:33:09,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-98.74347, -27.475962, -91.14458, -136.61911, -53.631325, -140.93121, -135.54834, -121.44927, -145.78087, -204.00566]
2025-05-09 14:33:09,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:33:09,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 39 minutes, 44 seconds)
2025-05-09 14:35:44,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:35:57,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -132.13701 ± 28.803
2025-05-09 14:35:57,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-125.466324, -180.60579, -179.14928, -152.22798, -133.0991, -124.042816, -110.03907, -107.33641, -122.21415, -87.18926]
2025-05-09 14:35:57,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:35:57,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 37 minutes, 23 seconds)
2025-05-09 14:38:30,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:38:43,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -83.89424 ± 43.349
2025-05-09 14:38:43,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-65.250854, -9.168031, -126.68591, -28.742449, -78.40227, -81.07101, -117.602234, -157.43604, -113.566635, -61.016968]
2025-05-09 14:38:43,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:38:43,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-83.89) for latency MM1Queue_a033_s075
2025-05-09 14:38:43,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:38:43,150 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:38:43,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 34 minutes, 21 seconds)
2025-05-09 14:41:15,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:41:28,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -140.43800 ± 37.231
2025-05-09 14:41:28,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-95.73529, -154.13791, -163.23206, -151.90881, -53.36578, -123.5272, -168.7525, -157.92726, -151.44344, -184.34973]
2025-05-09 14:41:28,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:41:28,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 31 minutes, 1 second)
2025-05-09 14:43:59,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:44:12,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -105.21696 ± 28.711
2025-05-09 14:44:12,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-106.73704, -112.17651, -90.29544, -93.06132, -178.28857, -94.494026, -75.16162, -74.26938, -100.47199, -127.21385]
2025-05-09 14:44:12,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:44:12,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 27 minutes, 29 seconds)
2025-05-09 14:46:43,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:46:56,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -146.25656 ± 18.792
2025-05-09 14:46:56,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-134.79482, -116.01603, -158.61905, -186.01186, -154.84872, -150.74481, -126.458984, -157.8111, -141.49922, -135.76102]
2025-05-09 14:46:56,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:46:56,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 24 minutes, 3 seconds)
2025-05-09 14:49:27,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:49:39,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -92.97227 ± 77.343
2025-05-09 14:49:39,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-144.02652, -70.47399, -305.10287, -50.978443, -49.318943, -82.00235, -91.95125, -53.04404, -63.837833, -18.986382]
2025-05-09 14:49:39,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:49:39,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 20 minutes, 11 seconds)
2025-05-09 14:52:10,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:52:23,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -120.78137 ± 33.294
2025-05-09 14:52:23,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-89.91552, -106.007935, -87.47049, -180.016, -152.32712, -93.03181, -164.41835, -89.78689, -140.58704, -104.2525]
2025-05-09 14:52:23,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:52:23,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 16 minutes, 47 seconds)
2025-05-09 14:54:53,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:55:06,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -120.86877 ± 31.905
2025-05-09 14:55:06,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-62.12705, -112.46326, -163.00127, -128.10979, -73.21755, -98.33108, -142.14404, -139.36278, -151.89055, -138.04034]
2025-05-09 14:55:06,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:55:06,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 13 minutes, 29 seconds)
2025-05-09 14:57:36,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:57:49,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -96.66299 ± 74.859
2025-05-09 14:57:49,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-161.86661, 46.09965, -135.54938, -184.60869, -145.41788, 12.726423, -149.19235, -29.88464, -121.08123, -97.855194]
2025-05-09 14:57:49,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:57:49,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 10 minutes, 35 seconds)
2025-05-09 15:00:19,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:00:32,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -127.22668 ± 46.954
2025-05-09 15:00:32,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-134.96469, -91.38245, -120.13785, -190.02406, -142.2232, -147.17316, -108.59045, -105.29838, -29.58829, -202.88443]
2025-05-09 15:00:32,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:00:32,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 7 minutes, 40 seconds)
2025-05-09 15:03:03,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:03:15,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -115.92381 ± 42.274
2025-05-09 15:03:15,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-138.98541, -130.96626, -174.5909, -116.325584, -105.53771, -73.739, -119.5181, -182.60583, -77.5284, -39.440845]
2025-05-09 15:03:15,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:03:15,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 4 minutes, 56 seconds)
2025-05-09 15:05:46,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:05:58,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -140.53625 ± 78.034
2025-05-09 15:05:58,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-155.95834, -73.558525, -112.70828, -357.41467, -112.54888, -110.54829, -85.41187, -134.14005, -172.97725, -90.09637]
2025-05-09 15:05:58,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:05:58,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 2 minutes, 13 seconds)
2025-05-09 15:08:29,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:08:42,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -142.78391 ± 47.352
2025-05-09 15:08:42,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-114.13266, -116.53712, -252.16483, -155.14175, -136.3048, -70.084595, -126.44402, -109.551895, -164.23074, -183.24667]
2025-05-09 15:08:42,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:08:42,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 59 minutes, 34 seconds)
2025-05-09 15:11:12,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:11:25,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -106.69594 ± 27.795
2025-05-09 15:11:25,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-78.02218, -140.67444, -127.58432, -95.122765, -121.54567, -44.11277, -117.72414, -109.95505, -96.974724, -135.24333]
2025-05-09 15:11:25,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:11:25,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 56 minutes, 49 seconds)
2025-05-09 15:13:55,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:14:08,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -106.58484 ± 86.508
2025-05-09 15:14:08,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-131.16429, -74.824905, 29.374424, -90.263824, -88.7705, -325.0067, -109.116356, -47.929546, -76.07206, -152.07458]
2025-05-09 15:14:08,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:14:08,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 54 minutes, 5 seconds)
2025-05-09 15:16:38,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:16:51,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -109.19785 ± 33.362
2025-05-09 15:16:51,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-110.566185, -95.824905, -83.41021, -86.893425, -163.166, -75.15812, -87.94017, -120.36452, -90.3026, -178.35243]
2025-05-09 15:16:51,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:16:51,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 51 minutes, 21 seconds)
2025-05-09 15:19:22,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:19:34,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -93.88984 ± 39.783
2025-05-09 15:19:34,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-87.32201, -99.23833, -97.39931, -125.82738, -85.48403, -57.846878, -2.5688968, -160.12492, -112.72193, -110.36471]
2025-05-09 15:19:34,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:19:34,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 48 minutes, 36 seconds)
2025-05-09 15:22:15,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:22:28,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -99.18639 ± 34.960
2025-05-09 15:22:28,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-104.8534, -147.58661, -107.78607, -101.3331, -99.012314, -132.82855, -93.838196, -88.81237, -108.36927, -7.4439263]
2025-05-09 15:22:28,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:22:28,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 47 minutes, 54 seconds)
2025-05-09 15:24:58,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:25:11,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -106.88757 ± 91.538
2025-05-09 15:25:11,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-119.76023, -295.8602, -101.656265, -92.565445, -131.41185, -180.25233, 5.7788777, -129.82945, -83.57219, 60.25335]
2025-05-09 15:25:11,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:25:11,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 45 minutes, 9 seconds)
2025-05-09 15:27:41,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:27:54,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -99.16662 ± 35.389
2025-05-09 15:27:54,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-96.79768, -141.7573, -138.29178, -54.708687, -76.128555, -132.96985, -100.80139, -108.71064, -27.077932, -114.42244]
2025-05-09 15:27:54,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:27:54,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 42 minutes, 22 seconds)
2025-05-09 15:30:25,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:30:37,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -106.20320 ± 14.768
2025-05-09 15:30:37,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-84.44596, -103.09803, -131.38692, -112.47901, -92.31806, -102.84772, -126.55278, -98.28872, -118.381836, -92.232994]
2025-05-09 15:30:37,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:30:37,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 39 minutes, 43 seconds)
2025-05-09 15:33:08,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:33:21,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -75.64771 ± 31.242
2025-05-09 15:33:21,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-111.67907, -44.386894, -110.4775, -64.93529, -51.605465, -114.29759, -62.89279, -47.040554, -35.197243, -113.96464]
2025-05-09 15:33:21,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:33:21,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-75.65) for latency MM1Queue_a033_s075
2025-05-09 15:33:21,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:33:21,656 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:33:21,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 37 minutes, 5 seconds)
2025-05-09 15:35:52,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:36:04,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -117.70296 ± 37.598
2025-05-09 15:36:04,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-140.5047, -91.190025, -145.45856, -30.444187, -154.23767, -151.63644, -87.54227, -134.36221, -100.38642, -141.26707]
2025-05-09 15:36:04,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:36:04,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 32 minutes, 25 seconds)
2025-05-09 15:38:35,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:38:48,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -104.48900 ± 30.669
2025-05-09 15:38:48,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-85.44562, -124.54131, -103.56502, -93.18742, -44.37052, -100.59531, -121.94483, -161.2583, -131.02312, -78.95857]
2025-05-09 15:38:48,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:38:48,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 29 minutes, 46 seconds)
2025-05-09 15:41:18,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:41:31,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -80.57217 ± 54.931
2025-05-09 15:41:31,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-103.22938, -110.596146, -93.94232, -145.64432, -111.143135, -137.57527, 32.62901, 3.0152187, -61.839836, -77.39551]
2025-05-09 15:41:31,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:41:31,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 27 minutes, 6 seconds)
2025-05-09 15:44:02,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:44:14,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -96.54738 ± 50.903
2025-05-09 15:44:14,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-64.054405, -21.385683, -107.38441, -122.516, -132.54706, -1.5071555, -83.92281, -148.00133, -118.27445, -165.8805]
2025-05-09 15:44:14,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:44:14,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 24 minutes, 18 seconds)
2025-05-09 15:46:44,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:46:57,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -98.77690 ± 30.680
2025-05-09 15:46:57,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-57.493008, -129.63034, -71.98764, -99.15945, -128.39822, -74.22092, -134.5259, -55.58356, -136.34793, -100.422104]
2025-05-09 15:46:57,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:46:57,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 21 minutes, 25 seconds)
2025-05-09 15:49:27,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:49:40,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -106.38731 ± 34.741
2025-05-09 15:49:40,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-105.446884, -97.32775, -58.179188, -131.0633, -33.359367, -134.46828, -97.28238, -122.59734, -146.47551, -137.67297]
2025-05-09 15:49:40,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:49:40,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 18 minutes, 39 seconds)
2025-05-09 15:52:10,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:52:23,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -78.55074 ± 94.288
2025-05-09 15:52:23,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-75.295906, 47.42304, -273.2351, -185.50232, -104.25422, -14.062163, -101.84281, 54.248577, -52.47814, -80.50834]
2025-05-09 15:52:23,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:52:23,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 15 minutes, 49 seconds)
2025-05-09 15:54:53,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:55:06,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -100.37073 ± 31.868
2025-05-09 15:55:06,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-94.64574, -149.61656, -139.27304, -32.013386, -74.39535, -94.93675, -105.84228, -83.19541, -109.38233, -120.4065]
2025-05-09 15:55:06,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:55:06,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 13 minutes, 2 seconds)
2025-05-09 15:57:36,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:57:49,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -91.53149 ± 70.846
2025-05-09 15:57:49,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-283.75708, -84.885445, -65.527214, -75.87333, -120.86065, -58.225025, -103.68435, -5.2503996, -75.179245, -42.07213]
2025-05-09 15:57:49,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:57:49,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 10 minutes, 20 seconds)
2025-05-09 16:00:19,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:00:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -97.38045 ± 26.406
2025-05-09 16:00:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-138.35194, -69.103035, -65.77607, -89.6963, -73.18077, -132.80078, -81.18008, -133.7386, -94.545555, -95.43131]
2025-05-09 16:00:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:00:32,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 7 minutes, 41 seconds)
2025-05-09 16:03:02,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:03:15,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -87.68477 ± 17.909
2025-05-09 16:03:15,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-108.903015, -112.51224, -92.3625, -84.095535, -90.82482, -106.21115, -75.578835, -60.62313, -87.61679, -58.11971]
2025-05-09 16:03:15,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:03:15,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 4 minutes, 54 seconds)
2025-05-09 16:05:45,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:05:57,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -112.91646 ± 75.742
2025-05-09 16:05:57,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-108.32674, -336.131, -70.171455, -72.33433, -70.260895, -76.533455, -93.531906, -104.57165, -103.786316, -93.516846]
2025-05-09 16:05:57,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:05:57,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 2 minutes, 10 seconds)
2025-05-09 16:08:27,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:08:40,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -39.11353 ± 74.389
2025-05-09 16:08:40,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-88.60087, -123.01989, -66.00873, -82.00696, 158.54933, -64.836266, -45.677532, -13.474932, -70.30368, 4.244168]
2025-05-09 16:08:40,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:08:40,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-39.11) for latency MM1Queue_a033_s075
2025-05-09 16:08:40,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 16:08:40,629 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:08:40,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 59 minutes, 27 seconds)
2025-05-09 16:11:10,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:11:23,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -101.90234 ± 64.110
2025-05-09 16:11:23,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-112.1069, -160.31825, -118.11943, -117.24176, -100.14821, -137.15451, -112.31693, -121.22878, 84.69919, -125.08777]
2025-05-09 16:11:23,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:11:23,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 56 minutes, 41 seconds)
2025-05-09 16:13:53,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:14:06,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -129.75359 ± 60.206
2025-05-09 16:14:06,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-154.12968, -114.9566, -63.128685, -110.78854, -96.0797, -101.43053, -292.76715, -144.73845, -133.23747, -86.279]
2025-05-09 16:14:06,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:14:06,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 53 minutes, 53 seconds)
2025-05-09 16:16:36,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:16:48,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -98.68932 ± 46.475
2025-05-09 16:16:48,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-96.44582, -74.644325, -128.79424, -204.74115, -87.96602, -132.51718, -96.668785, -19.614653, -72.10811, -73.392914]
2025-05-09 16:16:48,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:16:48,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 51 minutes, 12 seconds)
2025-05-09 16:19:18,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:19:31,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -101.08833 ± 44.251
2025-05-09 16:19:31,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-142.57314, -114.16484, -118.08324, -89.37268, -166.07994, -108.167206, -23.135433, -22.388897, -99.08352, -127.83442]
2025-05-09 16:19:31,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:19:31,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 48 minutes, 28 seconds)
2025-05-09 16:22:01,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:22:13,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -78.05203 ± 94.979
2025-05-09 16:22:13,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-50.205257, -220.92746, -126.14437, 35.683464, -93.710945, -37.411156, -231.11258, -83.53243, -63.011135, 89.85148]
2025-05-09 16:22:13,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:22:13,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 45 minutes, 42 seconds)
2025-05-09 16:24:43,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:24:56,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -135.50969 ± 57.827
2025-05-09 16:24:56,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-151.53847, -130.23132, -243.6361, -96.73348, -160.291, -122.13556, -167.81462, -182.45128, -24.236666, -76.028465]
2025-05-09 16:24:56,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:24:56,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 42 minutes, 57 seconds)
2025-05-09 16:27:26,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:27:38,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -83.39558 ± 42.535
2025-05-09 16:27:38,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-46.587135, -93.43722, -115.75667, -98.05402, -20.955286, -109.31497, -72.43612, -124.21429, -143.40343, -9.796599]
2025-05-09 16:27:38,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:27:38,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 40 minutes, 13 seconds)
2025-05-09 16:30:09,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:30:21,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -91.30849 ± 51.541
2025-05-09 16:30:21,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-102.92763, -12.472034, -157.03441, -101.71544, 7.061101, -102.0528, -127.73769, -157.28873, -83.514305, -75.40292]
2025-05-09 16:30:21,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:30:21,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 37 minutes, 32 seconds)
2025-05-09 16:32:52,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:33:04,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -98.02306 ± 60.959
2025-05-09 16:33:04,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-100.788284, -82.79567, -104.83743, 19.24413, -176.15134, -96.60969, -91.366905, -70.25655, -218.80557, -57.863277]
2025-05-09 16:33:04,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:33:04,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 34 minutes, 54 seconds)
2025-05-09 16:35:34,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:35:47,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -115.30852 ± 44.256
2025-05-09 16:35:47,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-65.73821, -87.29178, -126.03401, -113.727325, -152.77386, -212.11044, -44.710186, -99.854935, -121.91842, -128.92603]
2025-05-09 16:35:47,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:35:47,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2025-05-09 16:38:17,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:38:30,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -95.29597 ± 53.297
2025-05-09 16:38:30,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-132.23608, -84.17691, -9.933685, -77.69614, -65.98533, -226.27914, -88.54225, -117.01341, -77.51722, -73.579636]
2025-05-09 16:38:30,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:38:30,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 29 minutes, 31 seconds)
2025-05-09 16:41:00,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:41:12,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -94.60457 ± 53.988
2025-05-09 16:41:12,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-110.92225, -65.09979, -67.58922, -69.485054, -155.43176, -42.058754, -40.73352, -93.21447, -224.84656, -76.66437]
2025-05-09 16:41:12,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:41:12,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 26 minutes, 49 seconds)
2025-05-09 16:43:42,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:43:55,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -82.53465 ± 88.184
2025-05-09 16:43:55,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2.568594, -283.5203, -3.707559, -130.85309, -132.32443, -32.890125, 40.863605, -87.34994, -119.536026, -78.5972]
2025-05-09 16:43:55,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:43:55,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 24 minutes, 3 seconds)
2025-05-09 16:46:25,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:46:37,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -140.09634 ± 78.801
2025-05-09 16:46:37,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-55.0384, -238.65482, -66.930336, -58.8277, -167.18385, -117.8977, -312.12927, -94.021194, -135.27666, -155.00366]
2025-05-09 16:46:37,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:46:37,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 21 minutes, 17 seconds)
2025-05-09 16:49:07,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:49:20,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -79.68143 ± 26.304
2025-05-09 16:49:20,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-69.4092, -78.24415, -60.835396, -46.497604, -86.136055, -73.40608, -87.530304, -115.85869, -132.53471, -46.362114]
2025-05-09 16:49:20,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:49:20,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 18 minutes, 31 seconds)
2025-05-09 16:51:49,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:52:02,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -105.53841 ± 31.489
2025-05-09 16:52:02,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-90.2987, -169.71584, -76.33616, -105.17052, -104.41747, -85.65086, -77.07555, -99.32757, -161.15855, -86.23295]
2025-05-09 16:52:02,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:52:02,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 15 minutes, 47 seconds)
2025-05-09 16:54:32,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:54:44,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -19.14673 ± 88.647
2025-05-09 16:54:44,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [181.28493, 21.331985, -27.329237, -129.44653, -37.486755, 4.00296, -96.075226, 68.08756, -106.353035, -69.48399]
2025-05-09 16:54:44,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:54:44,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-19.15) for latency MM1Queue_a033_s075
2025-05-09 16:54:44,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 16:54:44,736 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:54:44,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 13 minutes, 4 seconds)
2025-05-09 16:57:14,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:57:27,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -116.63033 ± 81.099
2025-05-09 16:57:27,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-67.756546, -108.138504, -329.99454, -91.425285, -63.74537, -64.70995, -96.30131, -195.16652, -46.860416, -102.20473]
2025-05-09 16:57:27,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:57:27,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 10 minutes, 22 seconds)
2025-05-09 16:59:57,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:00:10,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -77.96854 ± 35.521
2025-05-09 17:00:10,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-119.01578, -64.81687, -119.627495, -72.12851, -127.59355, -29.257568, -25.21291, -103.59624, -61.604298, -56.83214]
2025-05-09 17:00:10,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:00:10,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 7 minutes, 41 seconds)
2025-05-09 17:02:40,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:02:52,998 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -50.69277 ± 87.558
2025-05-09 17:02:52,998 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-78.70383, 42.349915, -26.336548, -71.19981, 40.534286, -90.8693, -236.00034, -54.255436, -115.07676, 82.63005]
2025-05-09 17:02:52,998 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:02:53,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 5 minutes, 2 seconds)
2025-05-09 17:05:23,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:05:36,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -54.37305 ± 35.334
2025-05-09 17:05:36,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-52.67722, -44.248894, -83.66698, -40.946953, -37.81847, -62.732864, 28.184628, -50.367264, -104.78459, -94.67196]
2025-05-09 17:05:36,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:05:36,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 2 minutes, 24 seconds)
2025-05-09 17:08:06,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:08:19,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -76.52986 ± 40.770
2025-05-09 17:08:19,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-101.241196, -32.58275, 6.3563857, -102.5021, -44.49441, -128.04094, -67.206116, -70.80122, -103.353645, -121.43261]
2025-05-09 17:08:19,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:08:19,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 59 minutes, 43 seconds)
2025-05-09 17:10:49,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:11:01,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -82.94872 ± 46.028
2025-05-09 17:11:01,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-44.195103, -69.043846, -116.88456, -56.082897, -43.250355, -207.29037, -72.3014, -65.16955, -84.503, -70.766136]
2025-05-09 17:11:01,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:11:01,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 57 minutes)
2025-05-09 17:13:31,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:13:44,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -127.60396 ± 97.424
2025-05-09 17:13:44,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-170.19876, -242.03262, -216.03957, -296.19113, -17.900663, -122.30965, -128.91302, -38.56556, -48.815834, 4.9272203]
2025-05-09 17:13:44,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:13:44,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 54 minutes, 17 seconds)
2025-05-09 17:16:14,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:16:26,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -82.83412 ± 75.140
2025-05-09 17:16:26,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-202.12361, -48.144775, 8.507723, -165.88832, 14.715499, -16.727577, -105.81451, -93.17477, -179.86499, -39.82588]
2025-05-09 17:16:26,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:16:26,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 51 minutes, 32 seconds)
2025-05-09 17:18:56,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:19:08,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -74.85143 ± 32.173
2025-05-09 17:19:08,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-53.964355, -95.00553, -136.4017, -80.36925, -77.0396, -107.17074, -64.03733, -42.206894, -76.25171, -16.06706]
2025-05-09 17:19:08,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:19:08,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 48 minutes, 45 seconds)
2025-05-09 17:21:38,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:21:51,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -60.07636 ± 64.492
2025-05-09 17:21:51,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [52.58698, -33.125725, -79.506226, -58.173576, -209.83665, -78.77776, -24.560926, -100.93034, -14.900777, -53.53863]
2025-05-09 17:21:51,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:21:51,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 46 minutes, 1 second)
2025-05-09 17:24:21,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:24:33,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -73.72170 ± 47.840
2025-05-09 17:24:33,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-96.72015, -90.01596, -176.90848, -60.312695, 7.365852, -81.95906, -6.490601, -72.16608, -76.81474, -83.19516]
2025-05-09 17:24:33,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:24:33,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 43 minutes, 18 seconds)
2025-05-09 17:27:03,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:27:16,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -42.79438 ± 62.847
2025-05-09 17:27:16,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [27.991028, 22.581213, 63.82705, -114.09182, -50.82278, -14.721199, -76.03592, -57.257946, -89.69204, -139.72145]
2025-05-09 17:27:16,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:27:16,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 40 minutes, 35 seconds)
2025-05-09 17:29:46,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:29:58,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -71.63174 ± 64.262
2025-05-09 17:29:58,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-61.400566, -99.08068, -109.82566, -36.41846, -13.882997, -47.55883, -159.04466, -95.36803, 63.570057, -157.30763]
2025-05-09 17:29:58,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:29:58,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 37 minutes, 53 seconds)
2025-05-09 17:32:28,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:32:41,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -93.57339 ± 77.607
2025-05-09 17:32:41,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-87.773315, -117.02727, -82.55488, 4.149424, -52.20754, -156.33627, 55.159966, -124.4094, -144.6135, -230.12112]
2025-05-09 17:32:41,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:32:41,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 35 minutes, 11 seconds)
2025-05-09 17:35:11,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:35:23,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -64.78502 ± 45.681
2025-05-09 17:35:23,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-18.830107, -50.057224, -8.144885, -113.93335, -2.2386854, -104.99612, -75.146576, -95.18501, -39.077133, -140.24107]
2025-05-09 17:35:23,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:35:23,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 32 minutes, 29 seconds)
2025-05-09 17:37:53,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:38:06,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -69.37453 ± 70.800
2025-05-09 17:38:06,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [10.462528, -99.399, -88.340195, -14.179693, 19.13431, -143.02036, -52.360268, -40.934166, -227.34512, -57.763306]
2025-05-09 17:38:06,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:38:06,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 29 minutes, 48 seconds)
2025-05-09 17:40:36,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:40:49,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -106.10364 ± 77.441
2025-05-09 17:40:49,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-125.57673, -153.01398, -262.5062, -36.67999, 38.96642, -110.34026, -87.04128, -41.60375, -141.54156, -141.69916]
2025-05-09 17:40:49,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:40:49,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 27 minutes, 6 seconds)
2025-05-09 17:43:19,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:43:31,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -91.23707 ± 86.140
2025-05-09 17:43:31,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6.507195, -84.45856, -62.1631, -80.27297, -64.85756, -63.866642, -78.36173, -278.62262, -218.52556, 12.250848]
2025-05-09 17:43:31,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:43:31,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 24 minutes, 23 seconds)
2025-05-09 17:46:01,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:46:14,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -113.30042 ± 68.342
2025-05-09 17:46:14,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-62.743214, -115.52076, -45.415565, -98.27446, -71.50653, -62.70273, -121.83397, -190.94124, -280.66226, -83.403496]
2025-05-09 17:46:14,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:46:14,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 21 minutes, 41 seconds)
2025-05-09 17:48:44,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:48:56,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -80.51040 ± 111.250
2025-05-09 17:48:56,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-49.43892, -208.20912, -107.227875, -126.19491, -87.68369, -93.832726, -48.995716, 187.62642, -24.87498, -246.27243]
2025-05-09 17:48:56,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:48:56,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 58 seconds)
2025-05-09 17:51:26,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:51:39,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 20.69181 ± 94.906
2025-05-09 17:51:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [149.31927, -1.6823878, -87.755905, 79.36791, -123.46277, 41.06245, 156.82939, -88.87438, -10.222161, 92.336685]
2025-05-09 17:51:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:51:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (20.69) for latency MM1Queue_a033_s075
2025-05-09 17:51:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 17:51:39,244 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:51:39,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 15 seconds)
2025-05-09 17:54:09,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:54:21,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -71.61747 ± 94.617
2025-05-09 17:54:21,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-176.78453, -141.2077, -46.18333, -82.06598, -46.36498, -63.939804, -116.71284, -155.06688, 179.68916, -67.537796]
2025-05-09 17:54:21,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:54:21,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 32 seconds)
2025-05-09 17:56:51,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:57:04,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -56.73107 ± 40.902
2025-05-09 17:57:04,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-5.268923, -15.95153, -107.0286, -71.521225, -59.00575, -79.11066, -55.718872, -116.106575, -73.94795, 16.349403]
2025-05-09 17:57:04,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:57:04,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 50 seconds)
2025-05-09 17:59:34,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:59:47,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -84.94691 ± 90.146
2025-05-09 17:59:47,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-15.889618, -116.47435, -186.04427, -75.1805, 34.35294, -288.13733, -3.2318063, -32.09108, -93.099724, -73.67342]
2025-05-09 17:59:47,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:59:47,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 7 seconds)
2025-05-09 18:02:16,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:02:29,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -36.10455 ± 43.965
2025-05-09 18:02:29,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [35.37352, -105.85486, -11.362907, -31.868816, 5.727025, -19.119982, -85.50248, 0.22209251, -75.844734, -72.8144]
2025-05-09 18:02:29,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:02:29,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 25 seconds)
2025-05-09 18:04:59,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:05:12,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -38.00144 ± 62.930
2025-05-09 18:05:12,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-75.88721, -90.897804, -16.645296, 90.77965, -58.31506, -35.99579, -109.112755, 63.21199, -67.270454, -79.8817]
2025-05-09 18:05:12,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:05:12,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 42 seconds)
2025-05-09 18:07:42,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:07:55,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -94.96854 ± 100.988
2025-05-09 18:07:55,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-230.04208, 49.037476, -122.85499, -294.1408, -119.53536, -32.12515, -7.681534, -125.75163, -63.54077, -3.050613]
2025-05-09 18:07:55,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:07:55,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1251 [DEBUG]: Training session finished
