2025-05-11 10:54:27,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 10:54:27,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 10:54:27,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x735ca5440f70>}
2025-05-11 10:54:27,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1111 [DEBUG]: using device: cpu
2025-05-11 10:54:27,294 baseline-sac-noisy-halfcheetah:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 24
2025-05-11 10:54:27,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-11 10:54:27,312 baseline-sac-noisy-halfcheetah:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 10:54:27,313 baseline-sac-noisy-halfcheetah:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=47, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 10:54:27,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-11 10:54:27,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-11 10:57:06,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:57:20,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -413.40048 ± 58.824
2025-05-11 10:57:20,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-314.114, -428.69702, -498.46396, -493.98492, -377.29254, -329.83502, -420.23007, -458.76865, -394.28262, -418.3357]
2025-05-11 10:57:20,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:57:20,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-413.40) for latency MM1Queue_a033_s075
2025-05-11 10:57:20,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:57:20,790 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 10:57:20,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 45 minutes, 54 seconds)
2025-05-11 11:00:10,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:00:24,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -250.03015 ± 59.091
2025-05-11 11:00:24,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-263.123, -147.9723, -176.91978, -280.01065, -237.58969, -239.36897, -254.44711, -274.32455, -381.13065, -245.41487]
2025-05-11 11:00:24,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:00:24,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-250.03) for latency MM1Queue_a033_s075
2025-05-11 11:00:24,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:00:24,355 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:00:24,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 51 minutes, 25 seconds)
2025-05-11 11:03:13,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:03:27,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -183.38678 ± 54.011
2025-05-11 11:03:27,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-191.12599, -240.32677, -99.59587, -231.21306, -150.40298, -242.00113, -154.42769, -252.61148, -105.00072, -167.16196]
2025-05-11 11:03:27,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:03:27,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-183.39) for latency MM1Queue_a033_s075
2025-05-11 11:03:27,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:03:27,678 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:03:27,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 51 minutes, 5 seconds)
2025-05-11 11:06:17,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:06:31,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -207.88286 ± 40.727
2025-05-11 11:06:31,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-182.97295, -179.55315, -217.87439, -200.65747, -135.1145, -240.37045, -264.2948, -240.91333, -255.37434, -161.70314]
2025-05-11 11:06:31,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:06:31,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 49 minutes, 30 seconds)
2025-05-11 11:09:20,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:09:34,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -140.89970 ± 58.088
2025-05-11 11:09:34,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-129.55675, -132.1384, -66.015724, -187.22588, -142.68979, -128.62032, -43.50312, -120.17379, -229.76064, -229.3126]
2025-05-11 11:09:34,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:09:34,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-140.90) for latency MM1Queue_a033_s075
2025-05-11 11:09:34,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:09:34,307 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:09:34,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 47 minutes, 9 seconds)
2025-05-11 11:12:23,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:12:37,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -161.31039 ± 62.793
2025-05-11 11:12:37,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-195.41412, -224.7439, -178.09848, -53.556618, -113.501625, -71.27652, -125.17967, -234.50572, -229.11684, -187.71046]
2025-05-11 11:12:37,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:12:37,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 47 minutes, 10 seconds)
2025-05-11 11:15:26,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:15:39,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -135.71938 ± 54.498
2025-05-11 11:15:39,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-212.22371, -90.78436, -51.80353, -123.61369, -74.63671, -124.17191, -124.99504, -232.03363, -156.0636, -166.86754]
2025-05-11 11:15:39,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:15:39,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-135.72) for latency MM1Queue_a033_s075
2025-05-11 11:15:39,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:15:39,906 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:15:39,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 43 minutes, 49 seconds)
2025-05-11 11:18:27,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:18:42,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -163.46248 ± 36.115
2025-05-11 11:18:42,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-111.77254, -109.25144, -187.53914, -203.05104, -173.17508, -180.83907, -213.89786, -120.075645, -182.1768, -152.84615]
2025-05-11 11:18:42,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:18:42,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 40 minutes, 29 seconds)
2025-05-11 11:21:32,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:21:46,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -126.84039 ± 80.721
2025-05-11 11:21:46,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-118.002235, -194.44397, -148.97559, -162.59602, -197.90787, -199.74054, 72.678, -46.856884, -104.71094, -167.84787]
2025-05-11 11:21:46,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:21:46,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-126.84) for latency MM1Queue_a033_s075
2025-05-11 11:21:46,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:21:46,263 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:21:46,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 37 minutes, 32 seconds)
2025-05-11 11:24:32,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:24:46,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -135.89851 ± 51.575
2025-05-11 11:24:46,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-118.27462, -121.40436, -120.96401, -218.27376, -219.68356, -127.69452, -166.19942, -127.869026, -101.96773, -36.654022]
2025-05-11 11:24:46,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:24:46,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 33 minutes, 35 seconds)
2025-05-11 11:27:33,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:27:47,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -103.17454 ± 37.193
2025-05-11 11:27:47,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-70.75094, -150.14972, -130.96736, -41.574276, -86.90424, -131.90704, -116.637276, -139.23615, -117.32809, -46.290306]
2025-05-11 11:27:47,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:27:47,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-103.17) for latency MM1Queue_a033_s075
2025-05-11 11:27:47,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:27:47,802 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:27:47,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 30 minutes, 7 seconds)
2025-05-11 11:30:34,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:30:48,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -87.73561 ± 40.809
2025-05-11 11:30:48,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-70.90358, -57.0278, -52.98653, -73.43733, -131.02065, -50.609406, -97.20975, -167.04672, -41.748135, -135.3662]
2025-05-11 11:30:48,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:30:48,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-87.74) for latency MM1Queue_a033_s075
2025-05-11 11:30:48,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:30:48,588 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:30:48,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 26 minutes, 32 seconds)
2025-05-11 11:33:34,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:33:47,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -56.11600 ± 47.766
2025-05-11 11:33:47,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-44.064667, 11.314008, -86.6925, -89.22699, -61.07849, -90.0596, -153.84164, -2.293432, -12.052488, -33.164272]
2025-05-11 11:33:47,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:33:47,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-56.12) for latency MM1Queue_a033_s075
2025-05-11 11:33:47,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:33:47,590 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:33:47,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 22 minutes, 31 seconds)
2025-05-11 11:36:41,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:36:53,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -22.75306 ± 73.548
2025-05-11 11:36:53,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [64.60415, -95.123116, -75.19802, -67.56656, 74.872154, 86.30669, 26.25504, -38.847458, -100.06542, -102.768005]
2025-05-11 11:36:53,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:36:53,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (-22.75) for latency MM1Queue_a033_s075
2025-05-11 11:36:53,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:36:53,762 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:36:53,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 20 minutes, 8 seconds)
2025-05-11 11:39:37,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:39:50,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -24.32071 ± 107.595
2025-05-11 11:39:50,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [77.01547, -34.2956, -22.00913, -146.85376, -156.9616, 203.20697, -95.04997, 13.7199545, -124.68469, 42.7052]
2025-05-11 11:39:50,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:39:50,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 16 minutes, 6 seconds)
2025-05-11 11:42:32,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:42:45,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -64.51726 ± 68.525
2025-05-11 11:42:45,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-25.194466, -53.67342, -92.397484, -29.422293, -102.5109, -191.08119, 92.98084, -70.60347, -86.068016, -87.20224]
2025-05-11 11:42:45,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:42:45,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 11 minutes, 14 seconds)
2025-05-11 11:45:28,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:45:40,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 75.09177 ± 130.584
2025-05-11 11:45:40,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [229.16461, -83.82393, 85.95157, 306.35272, 134.83835, -135.91644, 72.18923, -35.39097, 28.58568, 148.96693]
2025-05-11 11:45:40,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:45:40,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (75.09) for latency MM1Queue_a033_s075
2025-05-11 11:45:40,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:45:40,797 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:45:40,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 6 minutes, 50 seconds)
2025-05-11 11:48:24,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:48:37,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 30.62923 ± 83.136
2025-05-11 11:48:37,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [113.67393, 115.23792, 13.492276, -2.2588158, -81.62065, 149.6099, 95.02741, -1.3339255, 21.03649, -116.57227]
2025-05-11 11:48:37,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:48:37,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 3 minutes, 13 seconds)
2025-05-11 11:51:22,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:51:35,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 135.76718 ± 149.274
2025-05-11 11:51:35,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [102.39603, 203.94724, 281.6811, 72.097015, -67.72372, 176.47925, 153.61133, -53.427834, 455.23227, 33.37927]
2025-05-11 11:51:35,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:51:35,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (135.77) for latency MM1Queue_a033_s075
2025-05-11 11:51:35,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:51:35,224 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:51:35,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 57 minutes, 59 seconds)
2025-05-11 11:54:20,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:54:33,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 34.92571 ± 142.766
2025-05-11 11:54:33,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-80.776924, -202.42078, 33.255554, 317.7682, -76.05961, 8.568643, 129.59558, 118.031235, 161.82971, -60.53454]
2025-05-11 11:54:33,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:54:33,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 55 minutes, 33 seconds)
2025-05-11 11:57:18,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:57:32,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 69.62719 ± 146.278
2025-05-11 11:57:32,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [11.107665, -177.16225, 24.319109, 15.105649, -19.925781, 338.86642, 200.30289, -57.59533, 124.93348, 236.32007]
2025-05-11 11:57:32,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:57:32,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 53 minutes, 33 seconds)
2025-05-11 12:00:17,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:00:30,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 414.50946 ± 400.175
2025-05-11 12:00:30,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-3.9227817, 53.09435, 264.87573, 21.82178, 306.84653, 109.99755, 620.41473, 712.0568, 778.6545, 1281.2555]
2025-05-11 12:00:30,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:00:30,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (414.51) for latency MM1Queue_a033_s075
2025-05-11 12:00:30,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:00:30,330 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:00:30,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 51 minutes, 16 seconds)
2025-05-11 12:03:15,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:03:28,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 608.17188 ± 286.181
2025-05-11 12:03:28,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [600.0677, 763.4207, 828.50824, 773.9809, 1029.9026, 922.9655, 371.05704, 317.54456, 123.52532, 350.74597]
2025-05-11 12:03:28,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:03:28,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (608.17) for latency MM1Queue_a033_s075
2025-05-11 12:03:28,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:03:28,822 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:03:28,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 48 minutes, 47 seconds)
2025-05-11 12:06:14,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:06:26,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1050.90552 ± 487.732
2025-05-11 12:06:26,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1267.8799, 512.82635, 106.276375, 1547.4536, 1750.3973, 811.6445, 1560.1339, 749.0276, 1077.242, 1126.1737]
2025-05-11 12:06:26,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:06:26,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (1050.91) for latency MM1Queue_a033_s075
2025-05-11 12:06:26,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:06:26,996 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:06:27,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 45 minutes, 54 seconds)
2025-05-11 12:09:15,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:09:29,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 780.94983 ± 425.101
2025-05-11 12:09:29,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [435.509, 603.13025, 295.43713, 1474.2603, 816.395, 599.2006, 529.14484, 519.1474, 1654.5427, 882.73114]
2025-05-11 12:09:29,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:09:29,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 43 minutes, 55 seconds)
2025-05-11 12:12:19,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:12:33,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 910.35876 ± 648.735
2025-05-11 12:12:33,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1514.0082, 211.11864, 1466.6173, 521.6422, 1622.2389, 340.18527, 1914.5548, 162.53275, 1113.9369, 236.75261]
2025-05-11 12:12:33,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:12:33,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 42 minutes, 19 seconds)
2025-05-11 12:15:23,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:15:36,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1449.25781 ± 595.141
2025-05-11 12:15:36,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1583.4822, 1817.903, 1847.329, 686.7447, 1821.6715, 93.17173, 1886.8058, 1073.9066, 1966.3824, 1715.1803]
2025-05-11 12:15:36,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:15:36,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (1449.26) for latency MM1Queue_a033_s075
2025-05-11 12:15:36,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:15:36,971 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:15:36,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 40 minutes, 36 seconds)
2025-05-11 12:18:27,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:18:40,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1316.62280 ± 926.108
2025-05-11 12:18:40,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [569.61456, 1797.4003, 2326.863, 312.8865, 2164.7468, 2402.3096, 459.95544, 607.33124, 2406.822, 118.29764]
2025-05-11 12:18:40,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:18:40,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 38 minutes, 55 seconds)
2025-05-11 12:21:31,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:21:45,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1814.31970 ± 717.548
2025-05-11 12:21:45,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [674.5455, 2567.0786, 2592.5203, 2381.8296, 2252.9258, 1819.2091, 2122.7092, 772.0363, 864.466, 2095.878]
2025-05-11 12:21:45,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:21:45,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (1814.32) for latency MM1Queue_a033_s075
2025-05-11 12:21:45,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:21:45,387 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:21:45,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 37 minutes, 21 seconds)
2025-05-11 12:24:35,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:24:49,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2480.99268 ± 854.163
2025-05-11 12:24:49,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2711.7107, 2562.3499, 2823.7815, 2582.6375, 2759.6074, 2953.1238, 2879.3481, 2772.383, -58.120068, 2823.1047]
2025-05-11 12:24:49,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:24:49,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2480.99) for latency MM1Queue_a033_s075
2025-05-11 12:24:49,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:24:49,449 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:24:49,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 34 minutes, 42 seconds)
2025-05-11 12:27:39,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:27:53,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2678.91846 ± 261.705
2025-05-11 12:27:53,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2847.993, 3030.0205, 2507.0874, 2765.832, 2527.164, 2485.5278, 2984.2021, 2632.641, 2878.0193, 2130.6956]
2025-05-11 12:27:53,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:27:53,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2678.92) for latency MM1Queue_a033_s075
2025-05-11 12:27:53,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:27:53,158 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:27:53,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 31 minutes, 33 seconds)
2025-05-11 12:30:42,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:30:56,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2456.37036 ± 950.380
2025-05-11 12:30:56,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2908.8838, 2919.305, 2694.3318, 2669.8113, 2852.8005, -347.5563, 2430.2444, 2564.1582, 3035.0662, 2836.6582]
2025-05-11 12:30:56,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:30:56,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 28 minutes, 30 seconds)
2025-05-11 12:33:46,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:34:00,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2184.45557 ± 980.760
2025-05-11 12:34:00,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2591.4902, 2745.152, 2768.599, 466.17404, 2771.1067, 2788.2285, 25.459301, 2491.653, 2736.5022, 2460.19]
2025-05-11 12:34:00,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:34:00,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 25 minutes, 20 seconds)
2025-05-11 12:36:50,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:37:04,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2622.78247 ± 643.856
2025-05-11 12:37:04,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2806.2197, 863.0058, 2271.1133, 2699.667, 2927.4126, 3329.3318, 2664.6606, 2671.0508, 2933.4158, 3061.948]
2025-05-11 12:37:04,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:37:04,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 22 minutes, 16 seconds)
2025-05-11 12:39:54,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:40:08,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2524.55908 ± 672.244
2025-05-11 12:40:08,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2635.4504, 2984.461, 2767.1282, 3054.5396, 2994.7446, 2707.9827, 2547.2231, 1777.076, 2968.4163, 808.57074]
2025-05-11 12:40:08,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:40:08,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 19 minutes, 9 seconds)
2025-05-11 12:42:58,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:43:12,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2757.45874 ± 186.915
2025-05-11 12:43:12,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2932.8435, 2427.665, 2824.0894, 2651.9192, 2966.9475, 2934.5234, 2987.8582, 2665.7185, 2556.4805, 2626.5432]
2025-05-11 12:43:12,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:43:12,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2757.46) for latency MM1Queue_a033_s075
2025-05-11 12:43:12,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:43:12,013 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:43:12,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 16 minutes, 1 second)
2025-05-11 12:46:01,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:46:15,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2974.56567 ± 149.680
2025-05-11 12:46:15,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3138.1423, 2709.569, 2884.341, 3212.5078, 2937.5242, 3170.4766, 2967.7266, 2967.3352, 2833.23, 2924.8037]
2025-05-11 12:46:15,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:46:15,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2974.57) for latency MM1Queue_a033_s075
2025-05-11 12:46:15,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:46:15,097 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:46:15,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 12 minutes, 50 seconds)
2025-05-11 12:49:04,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:49:18,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2863.00488 ± 188.781
2025-05-11 12:49:18,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2802.6602, 2945.871, 3110.8948, 2477.203, 3113.6482, 2737.2656, 2687.6357, 2829.1687, 2912.2915, 3013.4097]
2025-05-11 12:49:18,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:49:18,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 9 minutes, 40 seconds)
2025-05-11 12:52:06,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:52:20,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2576.07666 ± 941.030
2025-05-11 12:52:20,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2296.8103, 3106.8254, 2745.354, -143.29213, 2999.8765, 2920.7937, 2767.497, 3222.3972, 2722.7722, 3121.7332]
2025-05-11 12:52:20,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:52:20,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 6 minutes, 12 seconds)
2025-05-11 12:55:08,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:55:23,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2734.41577 ± 436.741
2025-05-11 12:55:23,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2900.9602, 3102.3845, 2903.3262, 2786.8994, 2416.6885, 3120.957, 2623.8748, 2997.8179, 2917.6414, 1573.6102]
2025-05-11 12:55:23,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:55:23,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 2 minutes, 55 seconds)
2025-05-11 12:58:16,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:58:30,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2524.81274 ± 743.937
2025-05-11 12:58:30,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2790.003, 2787.7605, 2907.6677, 1776.0997, 3119.283, 594.05444, 2965.1946, 2802.7239, 3092.9487, 2412.3909]
2025-05-11 12:58:30,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:58:30,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 38 seconds)
2025-05-11 13:01:19,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:01:33,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2559.41431 ± 770.747
2025-05-11 13:01:33,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2941.4224, 2945.161, 3081.251, 2965.0835, 3030.7935, 2707.8977, 643.37286, 2940.3306, 2819.081, 1519.7478]
2025-05-11 13:01:33,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:01:33,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 57 minutes, 29 seconds)
2025-05-11 13:04:21,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:04:34,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2796.60645 ± 585.887
2025-05-11 13:04:34,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2984.2974, 3198.545, 2919.8037, 3023.3977, 2928.0461, 2957.484, 2935.403, 2926.3035, 1055.3424, 3037.4438]
2025-05-11 13:04:34,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:04:34,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 54 minutes, 10 seconds)
2025-05-11 13:07:23,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:07:36,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2890.97119 ± 184.251
2025-05-11 13:07:36,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3124.8667, 2975.4043, 2863.3032, 2907.6753, 2586.2854, 2695.6255, 2850.2798, 3255.5876, 2859.9507, 2790.735]
2025-05-11 13:07:36,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:07:36,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 51 minutes)
2025-05-11 13:10:21,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:10:35,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2413.95459 ± 942.701
2025-05-11 13:10:35,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2829.6047, 2994.8186, 98.672554, 2643.9028, 2897.5974, 2967.5781, 2858.097, 2768.5547, 1086.2346, 2994.4858]
2025-05-11 13:10:35,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:10:35,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 47 minutes, 17 seconds)
2025-05-11 13:13:20,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:13:34,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2484.64697 ± 842.821
2025-05-11 13:13:34,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2834.4043, 1285.5852, 2906.4045, 2574.535, 2950.178, 3001.3438, 2930.893, 435.4391, 3053.5354, 2874.1538]
2025-05-11 13:13:34,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:13:34,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 42 minutes, 43 seconds)
2025-05-11 13:16:20,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:16:34,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2915.18286 ± 147.456
2025-05-11 13:16:34,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2946.3958, 3075.9702, 2733.8584, 3025.6243, 3193.7283, 2733.0752, 2748.0652, 2858.6345, 2960.8696, 2875.6084]
2025-05-11 13:16:34,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:16:34,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 39 minutes, 13 seconds)
2025-05-11 13:19:21,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:19:34,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2646.65381 ± 620.488
2025-05-11 13:19:34,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3166.5115, 2937.103, 1267.1838, 1637.1049, 2784.8652, 2911.4333, 2649.0293, 3075.335, 2911.5928, 3126.3767]
2025-05-11 13:19:34,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:19:34,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 35 minutes, 59 seconds)
2025-05-11 13:22:20,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:22:33,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2963.46338 ± 98.311
2025-05-11 13:22:33,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3020.323, 2773.873, 2872.9707, 2898.0505, 3159.5515, 2935.0913, 3023.3738, 2976.7437, 3007.875, 2966.7812]
2025-05-11 13:22:33,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:22:33,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 32 minutes, 29 seconds)
2025-05-11 13:25:18,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:25:32,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2686.44678 ± 748.056
2025-05-11 13:25:32,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2798.9907, 2946.6968, 2747.2734, 2951.0464, 3113.2358, 2580.0203, 501.52747, 3183.4377, 2979.8801, 3062.359]
2025-05-11 13:25:32,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:25:32,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 29 minutes, 29 seconds)
2025-05-11 13:28:18,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:28:32,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2662.18286 ± 732.667
2025-05-11 13:28:32,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2855.4497, 2917.811, 2847.3206, 2950.1257, 3026.6604, 2892.4272, 2822.126, 3139.2065, 492.18915, 2678.5112]
2025-05-11 13:28:32,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:28:32,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 26 minutes, 38 seconds)
2025-05-11 13:31:18,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:31:32,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2949.38696 ± 181.932
2025-05-11 13:31:32,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2976.0225, 3067.6106, 2874.9277, 2942.9502, 2524.3923, 3278.919, 3044.1863, 2954.8315, 2843.843, 2986.1853]
2025-05-11 13:31:32,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:31:32,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 23 minutes, 40 seconds)
2025-05-11 13:34:18,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:34:31,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2929.36499 ± 93.294
2025-05-11 13:34:31,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2787.3044, 2880.92, 3049.7744, 2877.9485, 3016.2175, 2887.1138, 2837.7085, 2889.4238, 2977.4746, 3089.766]
2025-05-11 13:34:31,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:34:31,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 20 minutes, 33 seconds)
2025-05-11 13:37:17,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:37:30,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2639.47754 ± 867.606
2025-05-11 13:37:30,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2910.4856, 2870.818, 2926.8132, 3063.4084, 2904.2327, 2991.543, 3027.8337, 2978.7876, 55.395756, 2665.4568]
2025-05-11 13:37:30,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:37:30,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 17 minutes, 33 seconds)
2025-05-11 13:40:17,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:40:31,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2867.64307 ± 164.741
2025-05-11 13:40:31,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3008.999, 2828.042, 2812.7344, 2929.6907, 2881.8577, 2712.935, 2922.0862, 2904.0127, 2511.842, 3164.2295]
2025-05-11 13:40:31,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:40:31,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 14 minutes, 46 seconds)
2025-05-11 13:43:20,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:43:34,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2977.40723 ± 218.796
2025-05-11 13:43:34,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2810.8936, 2963.4583, 2860.1868, 3051.426, 3212.4207, 2462.2642, 3149.1646, 3226.362, 3108.5054, 2929.391]
2025-05-11 13:43:34,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:43:34,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (2977.41) for latency MM1Queue_a033_s075
2025-05-11 13:43:34,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:43:34,117 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 13:43:34,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 12 minutes, 15 seconds)
2025-05-11 13:46:23,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:46:37,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2596.56787 ± 871.845
2025-05-11 13:46:37,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2670.9204, 2423.7422, 3013.855, 3034.7195, 2943.796, 2987.49, 88.63878, 2659.5405, 2770.2703, 3372.7092]
2025-05-11 13:46:37,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:46:37,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 9 minutes, 41 seconds)
2025-05-11 13:49:26,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:49:39,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2885.39575 ± 229.313
2025-05-11 13:49:39,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2635.576, 2980.8909, 2784.1748, 2663.6729, 2533.2188, 2968.8677, 3302.6392, 3163.3237, 2835.6833, 2985.9092]
2025-05-11 13:49:39,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:49:39,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 7 minutes, 2 seconds)
2025-05-11 13:52:30,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:52:47,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2750.62256 ± 413.498
2025-05-11 13:52:47,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2647.9875, 2928.0166, 2879.5151, 2666.413, 3166.1165, 3166.5164, 2818.2493, 2870.415, 2743.537, 1619.4615]
2025-05-11 13:52:47,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:52:47,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 5 minutes, 19 seconds)
2025-05-11 13:56:19,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:56:37,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2651.53369 ± 715.112
2025-05-11 13:56:37,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2978.6895, 3009.7864, 3036.8394, 3095.5283, 2891.97, 2326.1553, 3023.4624, 2655.5005, 2889.022, 608.3838]
2025-05-11 13:56:37,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:56:37,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 8 minutes, 48 seconds)
2025-05-11 13:59:41,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:59:59,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3034.07788 ± 225.350
2025-05-11 13:59:59,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2847.527, 3047.2866, 2881.0918, 2847.4204, 3004.4736, 3324.48, 3346.9753, 3345.9094, 3027.886, 2667.729]
2025-05-11 13:59:59,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:59:59,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3034.08) for latency MM1Queue_a033_s075
2025-05-11 13:59:59,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:59:59,109 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 13:59:59,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 8 minutes, 2 seconds)
2025-05-11 14:03:11,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:03:29,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2959.02368 ± 200.228
2025-05-11 14:03:29,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3006.7168, 3076.6497, 2871.4102, 3112.5986, 2963.071, 2735.684, 2676.8997, 3218.4458, 3244.2856, 2684.475]
2025-05-11 14:03:29,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:03:29,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 8 minutes, 10 seconds)
2025-05-11 14:06:59,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:07:16,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3046.74072 ± 168.892
2025-05-11 14:07:16,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2944.348, 3376.9094, 3005.7935, 3158.0605, 2932.7754, 3161.0088, 3063.716, 3148.9695, 2955.5532, 2720.274]
2025-05-11 14:07:16,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:07:16,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3046.74) for latency MM1Queue_a033_s075
2025-05-11 14:07:16,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:07:16,582 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 14:07:16,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 10 minutes, 23 seconds)
2025-05-11 14:10:48,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:11:05,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2963.22290 ± 655.625
2025-05-11 14:11:05,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3320.005, 3179.5027, 3121.359, 3239.4556, 3133.6592, 3235.6924, 3234.7314, 3024.4812, 3132.936, 1010.4054]
2025-05-11 14:11:05,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:11:05,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 11 minutes, 44 seconds)
2025-05-11 14:14:20,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:14:34,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2982.53882 ± 160.679
2025-05-11 14:14:34,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2688.969, 3003.8062, 3193.5994, 3210.2969, 3045.297, 3041.727, 3000.393, 2950.5957, 2725.5608, 2965.1418]
2025-05-11 14:14:34,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:14:34,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 5 minutes, 36 seconds)
2025-05-11 14:17:26,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:17:41,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3107.70923 ± 162.092
2025-05-11 14:17:41,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2900.7024, 3118.481, 2782.666, 3182.849, 2974.8364, 3277.3335, 3118.7678, 3194.9395, 3208.8352, 3317.679]
2025-05-11 14:17:41,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:17:41,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3107.71) for latency MM1Queue_a033_s075
2025-05-11 14:17:41,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:17:41,032 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 14:17:41,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 21 seconds)
2025-05-11 14:20:31,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:20:45,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2978.48755 ± 140.042
2025-05-11 14:20:45,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2904.0933, 3082.8691, 3096.173, 3030.0413, 3140.1047, 2758.6646, 2860.0044, 2897.3594, 3189.9944, 2825.5723]
2025-05-11 14:20:45,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:20:45,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 53 minutes, 59 seconds)
2025-05-11 14:23:33,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:23:47,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2999.23608 ± 84.075
2025-05-11 14:23:47,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2972.538, 3126.437, 2981.5178, 3076.9785, 2929.3748, 2902.7703, 2883.9385, 3055.2495, 2946.0442, 3117.511]
2025-05-11 14:23:47,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:23:47,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 45 minutes, 38 seconds)
2025-05-11 14:26:36,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:26:49,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3045.70703 ± 227.469
2025-05-11 14:26:49,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2971.6, 3367.2244, 3285.0898, 2614.702, 2747.0156, 2910.3508, 3092.1758, 3256.1394, 3060.873, 3151.9006]
2025-05-11 14:26:49,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:26:49,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 37 minutes, 33 seconds)
2025-05-11 14:29:39,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:29:52,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3029.00586 ± 191.469
2025-05-11 14:29:52,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2812.9739, 3087.7559, 2875.2632, 3072.4458, 3103.5134, 3218.948, 3358.9104, 3089.875, 3010.7703, 2659.6064]
2025-05-11 14:29:52,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:29:52,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 31 minutes, 49 seconds)
2025-05-11 14:32:41,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:32:54,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3163.45239 ± 122.963
2025-05-11 14:32:54,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3356.401, 3137.0854, 3064.2063, 3203.2708, 3150.9446, 3264.5483, 3181.4668, 3311.979, 2940.7588, 3023.8635]
2025-05-11 14:32:54,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:32:54,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3163.45) for latency MM1Queue_a033_s075
2025-05-11 14:32:54,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:32:54,410 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 14:32:54,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 28 minutes, 17 seconds)
2025-05-11 14:35:43,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:35:56,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2861.11646 ± 672.805
2025-05-11 14:35:56,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2958.6306, 3345.5916, 2929.9612, 3109.9175, 879.26105, 3211.5032, 2989.6748, 3065.0798, 3177.8762, 2943.6665]
2025-05-11 14:35:56,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:35:56,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 24 minutes, 59 seconds)
2025-05-11 14:38:45,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:38:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3089.39209 ± 227.912
2025-05-11 14:38:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3247.519, 3178.922, 3177.334, 2539.9272, 2885.6472, 3043.9502, 3202.4167, 2995.9038, 3392.788, 3229.512]
2025-05-11 14:38:58,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:38:58,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 22 minutes)
2025-05-11 14:41:47,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:42:00,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3048.12988 ± 202.308
2025-05-11 14:42:00,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3089.4036, 2814.671, 3453.365, 2949.6167, 2885.2942, 2929.1626, 2963.2656, 3117.8752, 2905.0225, 3373.6226]
2025-05-11 14:42:00,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:42:00,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 18 minutes, 54 seconds)
2025-05-11 14:44:50,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:45:02,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3114.50439 ± 177.868
2025-05-11 14:45:02,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2970.1843, 3315.8586, 3266.486, 3113.0342, 2807.0137, 3289.7637, 2832.3445, 3267.411, 3108.4443, 3174.4993]
2025-05-11 14:45:02,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:45:02,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 15 minutes, 52 seconds)
2025-05-11 14:47:51,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:48:04,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3017.33447 ± 543.935
2025-05-11 14:48:04,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2991.6057, 3045.2307, 3413.6208, 3428.867, 3145.1785, 1458.8599, 3191.1694, 3005.2202, 3088.755, 3404.837]
2025-05-11 14:48:04,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:48:04,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 12 minutes, 49 seconds)
2025-05-11 14:50:54,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:51:07,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2969.02588 ± 591.037
2025-05-11 14:51:07,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3288.9128, 2967.4236, 3142.5928, 3219.4382, 3034.0857, 3224.712, 3125.6616, 3323.128, 3141.3728, 1222.9304]
2025-05-11 14:51:07,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:51:07,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 9 minutes, 53 seconds)
2025-05-11 14:53:57,220 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:54:10,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3074.45264 ± 131.107
2025-05-11 14:54:10,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3119.0632, 3157.8765, 3173.9695, 3023.505, 3202.3716, 2748.3735, 3195.1787, 3087.2014, 2960.3923, 3076.5962]
2025-05-11 14:54:10,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:54:10,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 6 minutes, 54 seconds)
2025-05-11 14:56:59,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:57:12,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2962.81641 ± 456.838
2025-05-11 14:57:12,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3083.3445, 3090.0188, 3261.1917, 2350.8457, 3250.8235, 3348.9297, 3379.885, 3052.1926, 1877.4868, 2933.448]
2025-05-11 14:57:12,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:57:12,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 3 minutes, 51 seconds)
2025-05-11 15:00:01,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:00:14,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3204.35352 ± 140.029
2025-05-11 15:00:14,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2858.7576, 3135.225, 3221.6184, 3280.0703, 3311.1116, 3081.9297, 3199.4875, 3302.7014, 3325.0103, 3327.6223]
2025-05-11 15:00:14,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:00:14,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3204.35) for latency MM1Queue_a033_s075
2025-05-11 15:00:14,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:00:14,783 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 15:00:14,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 48 seconds)
2025-05-11 15:03:03,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:03:17,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3156.79639 ± 157.719
2025-05-11 15:03:17,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3187.6418, 3223.9375, 3398.9727, 3041.053, 3327.6584, 2880.3193, 3028.7522, 3287.214, 2984.7021, 3207.7146]
2025-05-11 15:03:17,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:03:17,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 57 minutes, 47 seconds)
2025-05-11 15:06:06,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:06:19,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3062.73975 ± 125.117
2025-05-11 15:06:19,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2928.9893, 3233.9521, 3080.7307, 3032.2976, 3134.1807, 3253.3374, 3162.0571, 2924.0935, 2880.186, 2997.5735]
2025-05-11 15:06:19,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:06:19,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 54 minutes, 42 seconds)
2025-05-11 15:09:05,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:09:19,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2347.88696 ± 1291.141
2025-05-11 15:09:19,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3188.0513, 220.87027, 3177.722, 185.23108, 3244.055, 767.42444, 3146.9556, 3044.308, 3165.8743, 3338.3767]
2025-05-11 15:09:19,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:09:19,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 51 minutes, 29 seconds)
2025-05-11 15:12:04,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:12:17,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3152.86011 ± 189.434
2025-05-11 15:12:17,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3013.8325, 3347.1719, 3351.2456, 3328.1934, 3300.3257, 2754.8645, 2988.6436, 3220.463, 3020.7317, 3203.129]
2025-05-11 15:12:17,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:12:17,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 15 seconds)
2025-05-11 15:15:02,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:15:15,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2869.43091 ± 907.013
2025-05-11 15:15:15,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3097.2913, 3060.4578, 2824.6257, 3032.7683, 3244.751, 3371.537, 3160.5576, 3395.384, 3313.0405, 193.89737]
2025-05-11 15:15:15,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:15:15,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 45 minutes, 2 seconds)
2025-05-11 15:18:00,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:18:13,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3225.74463 ± 212.729
2025-05-11 15:18:13,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3310.4272, 3537.192, 3187.6475, 2988.7695, 3395.5576, 3325.8857, 2855.6187, 3421.7388, 3280.0427, 2954.5654]
2025-05-11 15:18:13,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:18:13,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1226 [INFO]: New best (3225.74) for latency MM1Queue_a033_s075
2025-05-11 15:18:13,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:18:13,728 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 15:18:13,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 41 minutes, 50 seconds)
2025-05-11 15:20:59,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:21:12,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2846.90454 ± 839.164
2025-05-11 15:21:12,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2957.37, 2962.2854, 2978.6306, 3225.9543, 3312.484, 360.29654, 3113.4937, 3285.4202, 3246.7195, 3026.3933]
2025-05-11 15:21:12,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:21:12,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 38 minutes, 42 seconds)
2025-05-11 15:23:58,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:24:12,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2790.35229 ± 778.850
2025-05-11 15:24:12,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3200.1785, 1841.6558, 3018.465, 3300.0232, 3434.5378, 3474.1687, 3071.3984, 3024.9097, 879.2128, 2658.974]
2025-05-11 15:24:12,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:24:12,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 35 minutes, 43 seconds)
2025-05-11 15:26:58,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:27:11,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3066.50439 ± 166.374
2025-05-11 15:27:11,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3059.9065, 3038.5964, 2913.8652, 3156.8784, 2840.505, 3003.3174, 3158.4622, 2846.0012, 3321.45, 3326.0645]
2025-05-11 15:27:11,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:27:11,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 46 seconds)
2025-05-11 15:29:57,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:30:10,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2933.32300 ± 669.055
2025-05-11 15:30:10,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3256.658, 2939.2122, 3112.62, 3540.5383, 3315.0615, 3162.6938, 1034.8136, 2821.8064, 2834.3906, 3315.4373]
2025-05-11 15:30:10,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:30:10,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 29 minutes, 50 seconds)
2025-05-11 15:32:56,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:33:09,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3100.27588 ± 116.256
2025-05-11 15:33:09,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2952.0647, 3293.79, 2950.9707, 3068.582, 3069.2234, 3119.55, 3256.5708, 3219.1223, 3083.3784, 2989.5056]
2025-05-11 15:33:09,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:33:09,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 52 seconds)
2025-05-11 15:35:54,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:36:07,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3132.97559 ± 273.950
2025-05-11 15:36:07,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3469.8652, 3211.5525, 3118.4353, 3317.9836, 3234.9678, 3151.078, 3132.7175, 3069.7817, 3244.8174, 2378.5574]
2025-05-11 15:36:07,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:36:07,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 51 seconds)
2025-05-11 15:38:53,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:39:06,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3142.71826 ± 133.054
2025-05-11 15:39:06,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3034.1643, 3334.7854, 2960.7117, 3084.7524, 3281.232, 3020.987, 3060.4597, 3092.699, 3202.473, 3354.92]
2025-05-11 15:39:06,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:39:06,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 51 seconds)
2025-05-11 15:41:51,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:42:04,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2981.79883 ± 524.557
2025-05-11 15:42:04,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3157.1108, 3197.0225, 2078.472, 3531.44, 3024.879, 3310.8474, 3282.5942, 3134.657, 1857.519, 3243.4463]
2025-05-11 15:42:04,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:42:04,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 51 seconds)
2025-05-11 15:44:49,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:45:03,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2832.37329 ± 704.497
2025-05-11 15:45:03,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3471.9546, 2944.885, 3186.0645, 2002.0724, 3049.8235, 3186.3953, 3173.8958, 1033.1108, 3054.6655, 3220.863]
2025-05-11 15:45:03,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:45:03,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 52 seconds)
2025-05-11 15:47:49,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:48:02,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3103.85693 ± 220.432
2025-05-11 15:48:02,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3114.6013, 3034.21, 2971.9941, 3025.141, 3448.9897, 2690.5144, 3399.486, 3276.4119, 3182.9988, 2894.222]
2025-05-11 15:48:02,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:48:02,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 54 seconds)
2025-05-11 15:50:54,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:51:08,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2976.48584 ± 1049.126
2025-05-11 15:51:08,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2716.4785, 3477.2607, 3393.1497, 3415.7686, 3309.5366, 3558.5557, 3376.1782, 3136.6309, -95.38211, 3476.6812]
2025-05-11 15:51:08,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:51:08,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes)
2025-05-11 15:53:59,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:54:13,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2753.81934 ± 970.510
2025-05-11 15:54:13,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3586.3748, 2873.638, 3110.224, 237.69461, 3281.761, 3124.822, 3246.4233, 3356.5603, 3039.1382, 1681.5562]
2025-05-11 15:54:13,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:54:13,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 2 seconds)
2025-05-11 15:57:03,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:57:17,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2874.17920 ± 782.622
2025-05-11 15:57:17,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [585.7271, 3016.0671, 3294.8704, 3034.4102, 3186.031, 3085.5413, 3315.4995, 2771.647, 3413.6257, 3038.3752]
2025-05-11 15:57:17,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:57:17,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 2 seconds)
2025-05-11 16:00:08,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:00:22,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2844.31372 ± 898.175
2025-05-11 16:00:22,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2838.176, 3111.093, 3274.325, 3149.943, 3234.175, 216.30045, 2726.6587, 3169.9106, 3312.07, 3410.485]
2025-05-11 16:00:22,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 16:00:22,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-halfcheetah):1251 [DEBUG]: Training session finished
