2025-05-10 06:35:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 06:35:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 06:35:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x71a81b63df70>}
2025-05-10 06:35:18,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1111 [DEBUG]: using device: cpu
2025-05-10 06:35:18,860 baseline-sac-noisy-halfcheetah:77 [WARNING]: args.memorize_actions != args.horizon: 16 != 24
2025-05-10 06:35:18,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-10 06:35:18,872 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-10 06:35:18,872 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=119, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 06:35:19,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-10 06:35:19,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-10 06:37:54,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:38:11,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -771.00769 ± 29.674
2025-05-10 06:38:11,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-763.7518, -815.55963, -817.0333, -764.5156, -724.39465, -782.9973, -746.05023, -781.33636, -781.51764, -732.9202]
2025-05-10 06:38:11,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:38:11,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-771.01) for latency MM1Queue_a033_s075
2025-05-10 06:38:11,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 06:38:11,623 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 06:38:11,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 44 minutes, 43 seconds)
2025-05-10 06:40:55,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:41:12,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -257.87869 ± 98.063
2025-05-10 06:41:12,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-284.85147, -101.62447, -390.5103, -362.81403, -207.6183, -358.28226, -150.86055, -197.1764, -348.0514, -176.99777]
2025-05-10 06:41:12,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:41:12,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-257.88) for latency MM1Queue_a033_s075
2025-05-10 06:41:12,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 06:41:12,808 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 06:41:12,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 48 minutes, 53 seconds)
2025-05-10 06:43:56,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:44:14,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -370.15643 ± 76.900
2025-05-10 06:44:14,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-434.50125, -290.3495, -472.94858, -352.2522, -232.97705, -413.35922, -284.58463, -468.37408, -373.42365, -378.79416]
2025-05-10 06:44:14,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:44:14,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 48 minutes, 18 seconds)
2025-05-10 06:46:58,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:47:15,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -298.68741 ± 52.516
2025-05-10 06:47:15,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-207.53828, -358.0551, -304.40167, -250.05887, -312.26526, -277.6544, -263.33374, -371.69363, -270.17694, -371.69592]
2025-05-10 06:47:15,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:47:15,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 46 minutes, 42 seconds)
2025-05-10 06:49:59,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -182.36617 ± 63.164
2025-05-10 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-198.37169, -97.72721, -247.89333, -195.66438, -171.56427, -36.58723, -196.54102, -207.66429, -242.23526, -229.41293]
2025-05-10 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-182.37) for latency MM1Queue_a033_s075
2025-05-10 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 06:50:17,130 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 06:50:17,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 44 minutes, 23 seconds)
2025-05-10 06:53:01,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:53:18,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -277.34732 ± 122.962
2025-05-10 06:53:18,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-361.92654, -124.00907, -158.14185, -411.2611, -482.78992, -388.38437, -236.91803, -165.88727, -139.35692, -304.79797]
2025-05-10 06:53:18,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:53:18,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 44 minutes, 12 seconds)
2025-05-10 06:56:03,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:56:20,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -228.34750 ± 58.467
2025-05-10 06:56:20,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-205.42877, -268.36545, -175.52936, -198.10721, -276.73758, -361.1537, -170.64238, -255.37108, -166.84508, -205.29462]
2025-05-10 06:56:20,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:56:20,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 41 minutes, 22 seconds)
2025-05-10 06:59:06,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:59:23,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -239.01265 ± 44.079
2025-05-10 06:59:23,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-298.82364, -218.55046, -175.63626, -216.8016, -260.13586, -212.43245, -274.7048, -294.8998, -171.22427, -266.9171]
2025-05-10 06:59:23,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:59:23,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 38 minutes, 54 seconds)
2025-05-10 07:02:08,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:02:25,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -215.12215 ± 78.002
2025-05-10 07:02:25,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-177.20964, -147.69261, -211.36896, -119.49053, -335.68967, -191.26424, -288.93445, -204.3308, -128.03732, -347.20303]
2025-05-10 07:02:25,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:02:25,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 36 minutes, 2 seconds)
2025-05-10 07:05:10,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:05:28,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -168.31461 ± 53.607
2025-05-10 07:05:28,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-163.81047, -142.98512, -183.25989, -225.17552, -175.99477, -179.84354, -65.95138, -186.96587, -261.27252, -97.8869]
2025-05-10 07:05:28,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:05:28,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-168.31) for latency MM1Queue_a033_s075
2025-05-10 07:05:28,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 07:05:28,062 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 07:05:28,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 33 minutes, 16 seconds)
2025-05-10 07:08:13,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:08:30,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -202.73923 ± 68.178
2025-05-10 07:08:30,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-201.75249, -179.91606, -320.54654, -286.2333, -197.33165, -281.30728, -184.8688, -143.87268, -115.01471, -116.54879]
2025-05-10 07:08:30,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:08:30,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 30 minutes, 28 seconds)
2025-05-10 07:11:15,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:11:32,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -187.61296 ± 102.428
2025-05-10 07:11:32,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-411.84625, -143.69502, -205.07373, -91.08647, -330.96387, -107.79935, -184.05743, -203.98073, -91.296776, -106.32997]
2025-05-10 07:11:32,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:11:32,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 27 minutes, 31 seconds)
2025-05-10 07:14:17,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:14:34,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -153.82895 ± 31.970
2025-05-10 07:14:34,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-144.86794, -135.33318, -119.73659, -167.79892, -158.29922, -211.15088, -160.19212, -194.53032, -95.25801, -151.12238]
2025-05-10 07:14:34,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:14:34,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-153.83) for latency MM1Queue_a033_s075
2025-05-10 07:14:34,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 07:14:34,373 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 07:14:34,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 24 minutes, 8 seconds)
2025-05-10 07:17:18,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:17:36,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -134.06160 ± 47.870
2025-05-10 07:17:36,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-186.54059, -67.71796, -170.20993, -94.86053, -143.23625, -186.79233, -133.57764, -198.25221, -76.29138, -83.13717]
2025-05-10 07:17:36,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:17:36,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-134.06) for latency MM1Queue_a033_s075
2025-05-10 07:17:36,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 07:17:36,199 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 07:17:36,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 20 minutes, 58 seconds)
2025-05-10 07:20:20,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:20:38,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -168.72176 ± 46.627
2025-05-10 07:20:38,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-220.13925, -164.43909, -197.01218, -175.78838, -198.12135, -38.33457, -182.03284, -180.79468, -168.16121, -162.39398]
2025-05-10 07:20:38,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:20:38,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 17 minutes, 49 seconds)
2025-05-10 07:23:22,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:23:39,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -306.13608 ± 103.010
2025-05-10 07:23:39,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-246.1084, -269.9412, -474.21573, -89.9886, -392.44836, -246.92215, -385.4042, -367.54477, -244.69678, -344.09076]
2025-05-10 07:23:39,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:23:39,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 14 minutes, 38 seconds)
2025-05-10 07:26:24,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:26:41,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -226.97139 ± 72.927
2025-05-10 07:26:41,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-167.73515, -306.16992, -163.3043, -130.42499, -333.61484, -168.2426, -323.981, -283.67557, -214.9347, -177.63074]
2025-05-10 07:26:41,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:26:41,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 11 minutes, 32 seconds)
2025-05-10 07:29:26,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:29:43,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -272.11212 ± 76.065
2025-05-10 07:29:43,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-277.14893, -342.49484, -149.66994, -374.25076, -184.01993, -289.317, -328.4878, -168.67975, -261.39938, -345.6531]
2025-05-10 07:29:43,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:29:43,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 8 minutes, 27 seconds)
2025-05-10 07:32:27,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:32:45,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -251.63696 ± 90.632
2025-05-10 07:32:45,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-173.15347, -287.69736, -169.5808, -174.37784, -245.17224, -133.066, -260.69482, -260.67175, -412.0381, -399.91714]
2025-05-10 07:32:45,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:32:45,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 5 minutes, 24 seconds)
2025-05-10 07:35:29,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:35:47,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -219.78345 ± 64.299
2025-05-10 07:35:47,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-258.25433, -303.53613, -248.09338, -149.16727, -181.27486, -246.37854, -286.6701, -78.63945, -207.81335, -238.00693]
2025-05-10 07:35:47,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:35:47,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 2 minutes, 24 seconds)
2025-05-10 07:38:32,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:38:49,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -158.79466 ± 48.753
2025-05-10 07:38:49,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-123.73862, -104.501175, -128.55087, -176.81267, -164.45297, -155.52187, -147.44862, -289.35492, -170.39778, -127.16707]
2025-05-10 07:38:49,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:38:49,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 59 minutes, 34 seconds)
2025-05-10 07:41:34,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:41:52,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -292.37463 ± 36.297
2025-05-10 07:41:52,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-277.69342, -291.28592, -287.95157, -312.07297, -315.0242, -196.6104, -295.86224, -334.8499, -287.9244, -324.47144]
2025-05-10 07:41:52,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:41:52,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 56 minutes, 46 seconds)
2025-05-10 07:44:37,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:44:54,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -337.71063 ± 47.481
2025-05-10 07:44:54,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-371.89575, -442.3397, -356.9377, -299.3108, -357.86752, -331.1097, -299.08685, -301.54803, -265.90634, -351.10367]
2025-05-10 07:44:54,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:44:54,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 53 minutes, 51 seconds)
2025-05-10 07:47:39,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:47:56,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -170.42873 ± 28.175
2025-05-10 07:47:56,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-164.24681, -168.08498, -203.90903, -148.5679, -172.64474, -127.082924, -150.15079, -220.60797, -147.57196, -201.42003]
2025-05-10 07:47:56,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:47:56,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 50 minutes, 54 seconds)
2025-05-10 07:50:41,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:50:58,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -170.24274 ± 32.802
2025-05-10 07:50:58,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-245.43228, -142.74376, -118.33126, -152.49992, -174.48837, -163.78627, -171.3789, -164.95717, -163.7813, -205.02823]
2025-05-10 07:50:58,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:50:58,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 47 minutes, 54 seconds)
2025-05-10 07:53:43,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:54:00,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -244.71382 ± 62.651
2025-05-10 07:54:00,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-271.1618, -182.85947, -408.23712, -253.72443, -259.92645, -256.82278, -191.76344, -215.64445, -218.99475, -188.0035]
2025-05-10 07:54:00,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:54:00,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 44 minutes, 46 seconds)
2025-05-10 07:56:45,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:57:02,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -177.37227 ± 19.028
2025-05-10 07:57:02,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-193.82079, -191.76576, -165.26736, -132.29285, -193.9628, -164.65358, -186.577, -196.2501, -168.63652, -180.49603]
2025-05-10 07:57:02,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:57:02,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 41 minutes, 32 seconds)
2025-05-10 07:59:47,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:00:04,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 69.26359 ± 30.189
2025-05-10 08:00:04,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [75.489845, 65.17461, 94.67016, 108.97584, 93.524254, 48.111298, 54.725468, 83.78847, 72.18459, -4.008666]
2025-05-10 08:00:04,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:00:04,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (69.26) for latency MM1Queue_a033_s075
2025-05-10 08:00:04,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 08:00:04,393 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 08:00:04,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 38 minutes, 22 seconds)
2025-05-10 08:02:48,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:03:06,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -142.62314 ± 32.757
2025-05-10 08:03:06,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-134.75177, -148.47667, -127.94121, -204.14717, -108.8788, -180.40642, -160.56934, -149.8244, -127.463135, -83.772415]
2025-05-10 08:03:06,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:03:06,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 35 minutes, 14 seconds)
2025-05-10 08:05:50,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:06:07,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -208.47844 ± 46.424
2025-05-10 08:06:07,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-189.36703, -234.4503, -233.68843, -315.84567, -148.65808, -226.87144, -153.17056, -172.35321, -200.34953, -210.03015]
2025-05-10 08:06:07,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:06:07,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 32 minutes, 5 seconds)
2025-05-10 08:08:51,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:09:09,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -179.64658 ± 40.242
2025-05-10 08:09:09,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-164.06288, -170.72888, -199.54875, -150.48558, -177.65929, -205.30258, -168.67316, -150.28183, -128.34824, -281.37463]
2025-05-10 08:09:09,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:09:09,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 28 minutes, 56 seconds)
2025-05-10 08:11:53,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:12:10,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -140.51291 ± 22.909
2025-05-10 08:12:10,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-132.82451, -150.27704, -200.43546, -126.25178, -127.13945, -115.77327, -155.33719, -124.62388, -136.30258, -136.16385]
2025-05-10 08:12:10,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:12:10,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 25 minutes, 51 seconds)
2025-05-10 08:14:54,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:15:12,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -189.53668 ± 56.881
2025-05-10 08:15:12,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-260.42184, -197.35156, -162.14108, -251.82356, -137.62163, -160.10245, -156.29457, -111.18347, -295.10623, -163.3204]
2025-05-10 08:15:12,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:15:12,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 22 minutes, 46 seconds)
2025-05-10 08:17:56,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:18:13,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -145.44406 ± 38.428
2025-05-10 08:18:13,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-142.18262, -161.02142, -102.01031, -131.0752, -197.89043, -119.045044, -140.2059, -229.25673, -125.18288, -106.57007]
2025-05-10 08:18:13,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:18:13,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 19 minutes, 39 seconds)
2025-05-10 08:20:57,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:21:14,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -115.82542 ± 109.555
2025-05-10 08:21:14,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-72.29143, -58.322117, -57.458057, -64.60283, -16.333506, -77.38944, -77.1358, -140.58708, -418.7385, -175.39543]
2025-05-10 08:21:14,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:21:14,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 16 minutes, 34 seconds)
2025-05-10 08:23:59,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:24:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -185.45370 ± 16.010
2025-05-10 08:24:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-202.19539, -184.8671, -182.73714, -168.05023, -201.18826, -209.47285, -153.75383, -175.47385, -184.78952, -192.00882]
2025-05-10 08:24:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:24:15,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 13 minutes, 21 seconds)
2025-05-10 08:26:59,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:27:16,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -225.48178 ± 24.295
2025-05-10 08:27:16,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-206.68907, -243.59558, -202.78708, -213.5563, -205.15506, -211.25175, -223.45758, -213.52333, -255.42712, -279.37512]
2025-05-10 08:27:16,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:27:16,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 10 minutes, 4 seconds)
2025-05-10 08:29:59,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:30:16,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -214.49797 ± 33.291
2025-05-10 08:30:16,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-246.34845, -204.61024, -261.75528, -274.44098, -191.80202, -214.1525, -203.66351, -165.11922, -188.39227, -194.69537]
2025-05-10 08:30:16,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:30:16,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 6 minutes, 52 seconds)
2025-05-10 08:33:00,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:33:17,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -209.53911 ± 12.923
2025-05-10 08:33:17,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-204.75635, -227.52792, -196.25058, -195.70816, -200.87689, -191.64746, -210.44048, -221.86261, -227.45114, -218.86945]
2025-05-10 08:33:17,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:33:17,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 3 minutes, 41 seconds)
2025-05-10 08:36:00,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:36:17,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -244.94629 ± 94.139
2025-05-10 08:36:17,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-215.55556, -227.73758, -230.72316, -223.69005, -192.46848, -197.99051, -524.54, -221.55945, -194.56285, -220.6354]
2025-05-10 08:36:17,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:36:17,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 29 seconds)
2025-05-10 08:39:01,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:39:17,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -209.48567 ± 37.633
2025-05-10 08:39:17,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-181.51456, -175.18985, -160.55081, -190.91228, -274.94882, -250.20877, -219.91968, -253.42827, -216.52568, -171.65785]
2025-05-10 08:39:17,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:39:18,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 57 minutes, 27 seconds)
2025-05-10 08:42:01,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:42:18,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -220.76241 ± 22.720
2025-05-10 08:42:18,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-210.89066, -174.09084, -216.53674, -206.27568, -211.89606, -245.57597, -224.89453, -232.86098, -221.40735, -263.19507]
2025-05-10 08:42:18,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:42:18,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 54 minutes, 27 seconds)
2025-05-10 08:45:02,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:45:18,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -237.13675 ± 21.071
2025-05-10 08:45:18,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-233.42966, -221.48537, -223.11047, -225.98647, -256.6466, -258.3517, -205.48172, -216.28903, -267.20114, -263.38516]
2025-05-10 08:45:18,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:45:18,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 51 minutes, 24 seconds)
2025-05-10 08:48:02,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:48:19,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -374.79886 ± 58.179
2025-05-10 08:48:19,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-429.1731, -413.18793, -365.94104, -314.38327, -322.25192, -373.63766, -483.9316, -277.99948, -356.48083, -411.00174]
2025-05-10 08:48:19,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:48:19,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 48 minutes, 24 seconds)
2025-05-10 08:51:02,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:51:19,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -204.30200 ± 27.061
2025-05-10 08:51:19,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-200.63998, -220.89966, -184.18344, -220.21552, -163.72162, -170.09677, -252.95226, -234.68767, -207.43208, -188.19095]
2025-05-10 08:51:19,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:51:19,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 45 minutes, 23 seconds)
2025-05-10 08:54:03,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:54:20,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -189.29581 ± 21.027
2025-05-10 08:54:20,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-160.54414, -202.43762, -209.50168, -205.041, -205.6649, -186.9948, -213.48526, -156.88745, -160.16702, -192.23416]
2025-05-10 08:54:20,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:54:20,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 42 minutes, 21 seconds)
2025-05-10 08:57:03,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:57:20,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -120.59347 ± 13.300
2025-05-10 08:57:20,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-119.26831, -134.91722, -98.522446, -116.75331, -115.359436, -124.52902, -105.67346, -123.58942, -118.76001, -148.562]
2025-05-10 08:57:20,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 08:57:20,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 39 minutes, 21 seconds)
2025-05-10 09:00:04,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:00:20,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -142.72955 ± 18.463
2025-05-10 09:00:20,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-135.28476, -148.22469, -174.76482, -161.5844, -155.16069, -134.46252, -130.96732, -115.65412, -116.29046, -154.90181]
2025-05-10 09:00:20,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:00:20,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 36 minutes, 22 seconds)
2025-05-10 09:03:04,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:03:21,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -102.93168 ± 16.026
2025-05-10 09:03:21,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-111.51518, -61.02762, -104.52629, -87.65925, -107.03122, -102.866455, -109.91104, -116.37346, -115.3764, -113.02986]
2025-05-10 09:03:21,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:03:21,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 33 minutes, 20 seconds)
2025-05-10 09:06:04,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:06:21,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -204.99991 ± 25.998
2025-05-10 09:06:21,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-203.97633, -179.13611, -242.22153, -207.40164, -163.43413, -232.98561, -185.17519, -222.35901, -233.92387, -179.38576]
2025-05-10 09:06:21,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:06:21,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 30 minutes, 19 seconds)
2025-05-10 09:09:05,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:09:21,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -199.26064 ± 30.470
2025-05-10 09:09:21,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-181.16261, -183.63493, -168.07808, -157.79947, -253.09496, -198.3897, -199.98195, -178.77261, -239.37732, -232.31473]
2025-05-10 09:09:21,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:09:21,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 27 minutes, 19 seconds)
2025-05-10 09:12:05,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:12:22,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -206.06888 ± 26.194
2025-05-10 09:12:22,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-151.02126, -240.24435, -204.02205, -172.86359, -225.70961, -214.29453, -216.73932, -208.15437, -234.61108, -193.02853]
2025-05-10 09:12:22,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:12:22,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 24 minutes, 18 seconds)
2025-05-10 09:15:06,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:15:22,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -188.48083 ± 21.527
2025-05-10 09:15:22,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-180.29924, -170.20148, -171.38649, -189.8694, -180.06876, -150.52094, -204.64752, -196.8701, -215.54346, -225.401]
2025-05-10 09:15:22,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:15:22,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 21 minutes, 17 seconds)
2025-05-10 09:18:06,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:18:22,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -224.19516 ± 31.358
2025-05-10 09:18:22,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-194.61142, -243.51178, -251.83319, -229.58766, -220.04378, -224.98204, -259.7619, -231.32481, -240.74275, -145.5522]
2025-05-10 09:18:22,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:18:22,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 18 minutes, 14 seconds)
2025-05-10 09:21:06,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:21:22,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -206.06516 ± 20.021
2025-05-10 09:21:22,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-228.42572, -217.15735, -189.77214, -210.24416, -203.1785, -164.30458, -205.28462, -220.0372, -187.63739, -234.60994]
2025-05-10 09:21:22,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:21:22,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 15 minutes, 11 seconds)
2025-05-10 09:24:06,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:24:23,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -165.76700 ± 22.759
2025-05-10 09:24:23,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-171.69333, -146.56503, -142.25038, -217.30508, -173.5421, -183.11703, -172.02586, -136.87552, -166.70523, -147.59047]
2025-05-10 09:24:23,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:24:23,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 12 minutes, 11 seconds)
2025-05-10 09:27:07,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:27:23,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -172.53418 ± 28.187
2025-05-10 09:27:23,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-166.5372, -162.13043, -213.48257, -141.50479, -139.26013, -163.10649, -197.41646, -223.1975, -144.65826, -174.04787]
2025-05-10 09:27:23,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:27:23,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-05-10 09:30:07,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:30:23,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -58.45624 ± 24.605
2025-05-10 09:30:23,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-57.562363, -62.857372, -55.717537, -68.70778, -103.44811, -49.047714, -79.959625, -49.84767, -56.111217, -1.3029934]
2025-05-10 09:30:23,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:30:23,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 6 minutes, 10 seconds)
2025-05-10 09:33:08,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:33:24,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -263.42911 ± 102.503
2025-05-10 09:33:24,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-262.3929, -237.06203, -298.55884, -296.70908, -209.93031, -231.75493, -174.20802, -540.547, -152.31352, -230.81454]
2025-05-10 09:33:24,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:33:24,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 3 minutes, 14 seconds)
2025-05-10 09:36:08,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:36:25,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -261.12653 ± 33.394
2025-05-10 09:36:25,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-249.06537, -231.15273, -238.66788, -250.61313, -312.92993, -265.2416, -210.28795, -243.19667, -301.38287, -308.7272]
2025-05-10 09:36:25,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:36:25,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 17 seconds)
2025-05-10 09:39:08,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:39:25,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -220.43002 ± 26.218
2025-05-10 09:39:25,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-255.78505, -248.36542, -221.22281, -253.53383, -200.08266, -234.76323, -177.16643, -202.28981, -189.99332, -221.09789]
2025-05-10 09:39:25,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:39:25,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 57 minutes, 17 seconds)
2025-05-10 09:42:09,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:42:25,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -29.39270 ± 28.187
2025-05-10 09:42:25,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-13.67371, 13.520661, -39.21952, -16.591957, -50.65194, -38.214462, -68.48201, -63.295033, -36.308643, 18.989656]
2025-05-10 09:42:25,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:42:25,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 54 minutes, 16 seconds)
2025-05-10 09:45:09,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:45:26,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -58.19546 ± 15.632
2025-05-10 09:45:26,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-37.329876, -39.05506, -80.011154, -59.439705, -35.93253, -58.316727, -54.24941, -76.88321, -72.318565, -68.418396]
2025-05-10 09:45:26,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:45:26,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 51 minutes, 22 seconds)
2025-05-10 09:48:18,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:48:36,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -82.29264 ± 17.090
2025-05-10 09:48:36,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-88.032295, -74.25211, -70.46432, -74.60061, -102.71306, -68.21744, -67.695274, -123.851265, -75.35922, -77.74072]
2025-05-10 09:48:36,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:48:36,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 49 minutes, 23 seconds)
2025-05-10 09:51:21,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:51:38,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 27.47336 ± 42.150
2025-05-10 09:51:38,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [58.64559, -48.08392, 84.13104, 46.712166, 54.06909, -37.974075, -3.3731415, 31.62607, 22.501944, 66.47883]
2025-05-10 09:51:38,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:51:38,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 46 minutes, 36 seconds)
2025-05-10 09:54:24,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:54:41,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -25.62397 ± 26.927
2025-05-10 09:54:41,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-12.521524, -68.660225, -22.38155, -61.58711, -25.58476, -37.188812, -32.69393, 33.336056, -17.171597, -11.786243]
2025-05-10 09:54:41,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:54:41,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 43 minutes, 48 seconds)
2025-05-10 09:57:26,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:57:43,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -22.29139 ± 27.756
2025-05-10 09:57:43,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-76.485504, -6.342614, -51.108475, -31.424543, -2.2067342, 20.554766, -24.393679, -25.282927, -37.37887, 11.154718]
2025-05-10 09:57:43,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 09:57:43,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 40 minutes, 58 seconds)
2025-05-10 10:00:28,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:00:45,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -6.76596 ± 16.770
2025-05-10 10:00:45,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-38.160152, -30.848333, -9.809277, -1.8877487, -6.4812737, 11.799802, 6.4229865, 17.4606, -0.8457142, -15.310458]
2025-05-10 10:00:45,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:00:45,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 38 minutes, 1 second)
2025-05-10 10:03:31,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:03:48,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 100.06126 ± 17.013
2025-05-10 10:03:48,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [109.39039, 113.96051, 95.46858, 104.73008, 56.237747, 119.18358, 95.44749, 99.89842, 92.08592, 114.20994]
2025-05-10 10:03:48,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:03:48,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (100.06) for latency MM1Queue_a033_s075
2025-05-10 10:03:48,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 10:03:48,576 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:03:48,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 34 minutes, 16 seconds)
2025-05-10 10:06:33,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:06:51,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 77.14296 ± 24.247
2025-05-10 10:06:51,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [82.05452, 115.417725, 30.412579, 84.42986, 70.62277, 60.689297, 67.44683, 58.158447, 90.19847, 111.99911]
2025-05-10 10:06:51,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:06:51,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 31 minutes, 13 seconds)
2025-05-10 10:09:36,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:09:53,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 62.82180 ± 11.390
2025-05-10 10:09:53,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [66.262436, 65.359085, 49.924828, 52.55334, 70.84326, 48.8951, 86.90879, 54.4879, 72.50689, 60.47631]
2025-05-10 10:09:53,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:09:53,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 28 minutes, 11 seconds)
2025-05-10 10:12:38,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:12:56,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 239.18381 ± 23.274
2025-05-10 10:12:56,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [208.92163, 226.99776, 217.20998, 259.05026, 264.7401, 256.18213, 214.77116, 269.44006, 214.74695, 259.77808]
2025-05-10 10:12:56,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:12:56,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (239.18) for latency MM1Queue_a033_s075
2025-05-10 10:12:56,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 10:12:56,334 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:12:56,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 25 minutes, 10 seconds)
2025-05-10 10:15:41,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:15:59,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 349.88101 ± 21.351
2025-05-10 10:15:59,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [347.88412, 347.1473, 380.62943, 325.9231, 359.81366, 372.59955, 315.528, 376.11102, 326.80453, 346.36935]
2025-05-10 10:15:59,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:15:59,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (349.88) for latency MM1Queue_a033_s075
2025-05-10 10:15:59,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 10:15:59,187 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:15:59,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 22 minutes, 11 seconds)
2025-05-10 10:18:44,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:19:01,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 322.53491 ± 22.163
2025-05-10 10:19:01,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [357.9666, 301.28247, 309.0863, 347.9903, 308.8144, 284.7904, 338.65146, 336.47614, 332.3487, 307.9423]
2025-05-10 10:19:01,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:19:02,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 19 minutes, 9 seconds)
2025-05-10 10:21:47,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:22:04,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 176.45129 ± 20.832
2025-05-10 10:22:04,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [157.06888, 187.1957, 156.26477, 165.60237, 217.03381, 180.40324, 183.63876, 144.04361, 173.37303, 199.88872]
2025-05-10 10:22:04,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:22:04,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 16 minutes, 7 seconds)
2025-05-10 10:24:50,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:25:07,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 185.03160 ± 13.493
2025-05-10 10:25:07,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [179.70908, 169.0421, 162.48093, 212.75557, 196.97392, 191.67871, 182.71777, 190.40182, 178.87022, 185.68588]
2025-05-10 10:25:07,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:25:07,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 13 minutes, 5 seconds)
2025-05-10 10:27:52,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:28:10,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 212.36533 ± 24.656
2025-05-10 10:28:10,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [193.51556, 227.64362, 223.7451, 182.90611, 207.26585, 189.67023, 250.26938, 177.80371, 246.77834, 224.05539]
2025-05-10 10:28:10,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:28:10,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 10 minutes, 4 seconds)
2025-05-10 10:30:55,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:31:13,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 341.64752 ± 24.164
2025-05-10 10:31:13,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [362.324, 375.6215, 361.19568, 318.11166, 321.44156, 342.47787, 341.93713, 289.72882, 349.6382, 353.99854]
2025-05-10 10:31:13,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:31:13,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 7 minutes, 1 second)
2025-05-10 10:33:58,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:34:16,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 291.47183 ± 19.641
2025-05-10 10:34:16,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [298.19464, 301.76047, 286.907, 268.15152, 312.15625, 327.33875, 284.00876, 287.89093, 293.641, 254.66905]
2025-05-10 10:34:16,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:34:16,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 4 minutes)
2025-05-10 10:37:01,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:37:19,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 299.98044 ± 11.497
2025-05-10 10:37:19,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [293.1269, 290.55103, 284.5693, 324.63953, 290.02188, 295.53284, 301.0073, 299.42697, 310.85184, 310.07687]
2025-05-10 10:37:19,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:37:19,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 59 seconds)
2025-05-10 10:40:05,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:40:22,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 331.52191 ± 19.122
2025-05-10 10:40:22,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [350.01797, 307.24695, 356.14478, 330.30435, 317.66977, 314.81705, 306.20236, 348.39233, 357.90186, 326.5221]
2025-05-10 10:40:22,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:40:22,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 57 minutes, 56 seconds)
2025-05-10 10:43:08,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:43:25,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 357.73904 ± 17.424
2025-05-10 10:43:25,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [357.11923, 377.89322, 379.34598, 361.13803, 367.66055, 368.27243, 364.9353, 325.0518, 342.5252, 333.44867]
2025-05-10 10:43:25,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:43:25,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (357.74) for latency MM1Queue_a033_s075
2025-05-10 10:43:25,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 10:43:25,517 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:43:25,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 54 minutes, 54 seconds)
2025-05-10 10:46:10,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:46:28,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 421.52631 ± 16.302
2025-05-10 10:46:28,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [435.709, 418.65775, 420.8051, 412.73526, 427.02237, 416.55453, 382.2091, 431.4078, 447.41788, 422.74423]
2025-05-10 10:46:28,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:46:28,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (421.53) for latency MM1Queue_a033_s075
2025-05-10 10:46:28,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 10:46:28,384 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:46:28,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 51 minutes, 51 seconds)
2025-05-10 10:49:13,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:49:31,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 405.46411 ± 10.161
2025-05-10 10:49:31,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [392.91937, 402.03424, 416.34998, 421.9315, 396.59872, 397.188, 411.65884, 417.81354, 404.27173, 393.87534]
2025-05-10 10:49:31,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:49:31,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 47 seconds)
2025-05-10 10:52:16,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:52:34,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 403.93915 ± 18.890
2025-05-10 10:52:34,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [411.94028, 409.664, 407.4808, 400.66913, 374.99692, 447.38742, 395.41858, 404.22775, 408.71292, 378.8935]
2025-05-10 10:52:34,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:52:34,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 45 minutes, 44 seconds)
2025-05-10 10:55:20,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:55:37,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 461.12714 ± 15.475
2025-05-10 10:55:37,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [458.1617, 487.8054, 459.51834, 465.0071, 439.66928, 452.92535, 459.02747, 435.82495, 475.8168, 477.51523]
2025-05-10 10:55:37,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:55:37,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (461.13) for latency MM1Queue_a033_s075
2025-05-10 10:55:37,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 10:55:37,834 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:55:37,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 42 minutes, 42 seconds)
2025-05-10 10:58:23,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:58:40,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 451.18750 ± 15.845
2025-05-10 10:58:40,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [462.47797, 433.74472, 447.96863, 442.45602, 416.5875, 452.06604, 457.21982, 467.9676, 470.95856, 460.4281]
2025-05-10 10:58:40,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 10:58:40,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 39 minutes, 39 seconds)
2025-05-10 11:01:26,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:01:44,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 440.43448 ± 16.876
2025-05-10 11:01:44,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [443.0383, 444.0853, 415.70016, 444.6014, 438.65714, 460.34665, 456.14508, 405.27707, 458.1021, 438.3917]
2025-05-10 11:01:44,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:01:44,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 36 minutes, 37 seconds)
2025-05-10 11:04:29,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:04:47,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 495.93320 ± 17.192
2025-05-10 11:04:47,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [521.5181, 496.8747, 470.31992, 493.09396, 488.002, 507.2787, 516.08466, 512.75604, 473.91196, 479.49222]
2025-05-10 11:04:47,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:04:47,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (495.93) for latency MM1Queue_a033_s075
2025-05-10 11:04:47,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 11:04:47,344 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:04:47,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 33 minutes, 35 seconds)
2025-05-10 11:07:32,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:07:50,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 480.69379 ± 17.165
2025-05-10 11:07:50,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [478.25018, 460.90738, 492.93265, 513.33765, 472.43005, 479.24368, 460.57123, 482.97522, 462.63965, 503.6506]
2025-05-10 11:07:50,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:07:50,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 32 seconds)
2025-05-10 11:10:36,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:10:54,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 541.87079 ± 24.533
2025-05-10 11:10:54,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [524.9388, 492.5909, 571.05164, 560.416, 548.5622, 559.4095, 516.44104, 527.6807, 545.8883, 571.7292]
2025-05-10 11:10:54,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:10:54,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (541.87) for latency MM1Queue_a033_s075
2025-05-10 11:10:54,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 11:10:54,201 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:10:54,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 29 seconds)
2025-05-10 11:13:40,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:13:57,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 547.44122 ± 29.668
2025-05-10 11:13:57,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [587.0608, 557.3691, 578.9656, 557.0194, 491.7144, 515.71893, 576.04956, 514.052, 549.16364, 547.29846]
2025-05-10 11:13:57,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:13:57,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (547.44) for latency MM1Queue_a033_s075
2025-05-10 11:13:57,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 11:13:57,531 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:13:57,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 26 seconds)
2025-05-10 11:16:43,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:16:59,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 493.21539 ± 16.672
2025-05-10 11:16:59,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [489.0352, 481.72488, 485.21515, 485.69608, 475.84503, 514.223, 524.79626, 512.7962, 474.09283, 488.729]
2025-05-10 11:16:59,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:16:59,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 21 minutes, 21 seconds)
2025-05-10 11:19:43,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:20:00,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 512.54425 ± 26.993
2025-05-10 11:20:00,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [508.14722, 463.27524, 550.9752, 547.5392, 515.7712, 508.2351, 487.96234, 507.40677, 491.44336, 544.6864]
2025-05-10 11:20:00,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:20:00,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 15 seconds)
2025-05-10 11:22:44,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:23:01,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 510.36469 ± 19.400
2025-05-10 11:23:01,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [501.80212, 516.05035, 487.7265, 528.64685, 541.5739, 525.8426, 529.4601, 491.38596, 483.8813, 497.2773]
2025-05-10 11:23:01,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:23:01,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 10 seconds)
2025-05-10 11:25:46,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:26:02,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 520.45868 ± 29.081
2025-05-10 11:26:02,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [532.5046, 518.0596, 540.2759, 490.18936, 553.89844, 456.67862, 515.22, 562.43, 515.8331, 519.4976]
2025-05-10 11:26:02,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:26:02,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 6 seconds)
2025-05-10 11:28:47,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:29:04,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 550.83923 ± 16.280
2025-05-10 11:29:04,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [554.8564, 562.8957, 572.1374, 559.13184, 565.56775, 522.7094, 559.7998, 548.696, 523.88574, 538.71204]
2025-05-10 11:29:04,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:29:04,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (550.84) for latency MM1Queue_a033_s075
2025-05-10 11:29:04,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 11:29:04,293 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:29:04,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 4 seconds)
2025-05-10 11:31:55,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:32:12,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 547.65393 ± 12.519
2025-05-10 11:32:12,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [536.95514, 559.2603, 541.711, 559.7817, 558.7593, 540.51544, 554.94977, 556.00867, 518.7793, 549.8187]
2025-05-10 11:32:12,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:32:12,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 4 seconds)
2025-05-10 11:35:02,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:35:19,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 536.72931 ± 20.205
2025-05-10 11:35:19,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [532.3523, 508.57715, 551.305, 576.84106, 538.8225, 500.9157, 549.27216, 536.6927, 534.73035, 537.78394]
2025-05-10 11:35:19,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:35:19,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 3 seconds)
2025-05-10 11:38:09,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:38:26,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 528.66907 ± 25.276
2025-05-10 11:38:26,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [487.42075, 546.71094, 521.4858, 536.2361, 506.3553, 547.9579, 524.0626, 495.32773, 570.94354, 550.1897]
2025-05-10 11:38:26,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:38:26,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1251 [DEBUG]: Training session finished
