2025-05-10 11:38:28,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 11:38:28,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 11:38:28,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x79dfabc40f70>}
2025-05-10 11:38:28,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1111 [DEBUG]: using device: cpu
2025-05-10 11:38:28,196 baseline-sac-noisy-hopper:77 [WARNING]: args.memorize_actions != args.horizon: 16 != 24
2025-05-10 11:38:28,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-10 11:38:28,207 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-10 11:38:28,207 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=62, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 11:38:28,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-10 11:38:28,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-10 11:40:54,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:40:55,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 78.19428 ± 11.531
2025-05-10 11:40:55,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [78.28561, 81.36683, 89.88187, 81.1713, 88.7943, 46.947243, 86.03812, 76.46981, 73.73963, 79.24813]
2025-05-10 11:40:55,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 53.0, 56.0, 47.0, 56.0, 35.0, 54.0, 50.0, 49.0, 54.0]
2025-05-10 11:40:55,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (78.19) for latency MM1Queue_a033_s075
2025-05-10 11:40:55,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 11:40:55,049 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:40:55,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 2 minutes)
2025-05-10 11:43:29,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:43:30,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 47.29815 ± 10.903
2025-05-10 11:43:30,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [50.385456, 43.671227, 46.418633, 67.93199, 65.70446, 36.05403, 43.649624, 46.531193, 33.20973, 39.425186]
2025-05-10 11:43:30,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 70.0, 72.0, 93.0, 91.0, 60.0, 73.0, 66.0, 61.0, 67.0]
2025-05-10 11:43:30,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 6 minutes, 23 seconds)
2025-05-10 11:46:03,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:46:05,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 168.23755 ± 17.106
2025-05-10 11:46:05,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [194.23276, 162.31828, 147.46974, 175.71393, 143.51074, 176.68134, 158.75752, 165.2111, 160.47246, 198.00768]
2025-05-10 11:46:05,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 125.0, 112.0, 136.0, 109.0, 137.0, 126.0, 133.0, 133.0, 147.0]
2025-05-10 11:46:05,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (168.24) for latency MM1Queue_a033_s075
2025-05-10 11:46:05,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 11:46:05,932 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:46:05,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 6 minutes, 34 seconds)
2025-05-10 11:48:41,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:48:42,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 202.60898 ± 101.844
2025-05-10 11:48:42,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [31.833593, 229.32928, 115.169426, 285.92755, 174.98778, 252.21815, 330.01788, 56.77359, 211.37361, 338.45886]
2025-05-10 11:48:42,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 138.0, 86.0, 163.0, 113.0, 146.0, 188.0, 54.0, 126.0, 201.0]
2025-05-10 11:48:42,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (202.61) for latency MM1Queue_a033_s075
2025-05-10 11:48:42,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 11:48:42,962 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:48:42,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 5 minutes, 50 seconds)
2025-05-10 11:51:18,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:51:20,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 134.69212 ± 18.536
2025-05-10 11:51:20,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [107.39496, 101.55071, 148.38177, 139.6712, 144.68063, 168.98878, 129.36302, 141.35017, 136.24225, 129.29768]
2025-05-10 11:51:20,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 66.0, 89.0, 85.0, 93.0, 87.0, 79.0, 84.0, 83.0, 80.0]
2025-05-10 11:51:20,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 4 minutes, 21 seconds)
2025-05-10 11:53:56,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:53:57,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 202.44806 ± 48.308
2025-05-10 11:53:57,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [267.10205, 210.20587, 230.67615, 102.61027, 173.79158, 210.26181, 264.63174, 143.76094, 211.61192, 209.82808]
2025-05-10 11:53:57,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 113.0, 121.0, 66.0, 93.0, 115.0, 133.0, 84.0, 103.0, 114.0]
2025-05-10 11:53:57,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 5 minutes, 17 seconds)
2025-05-10 11:56:32,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:56:33,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 232.63681 ± 33.275
2025-05-10 11:56:33,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [236.60376, 242.21419, 252.91446, 238.95056, 249.2392, 246.97992, 133.90274, 244.61449, 243.42412, 237.52466]
2025-05-10 11:56:33,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 124.0, 128.0, 127.0, 128.0, 123.0, 79.0, 132.0, 124.0, 121.0]
2025-05-10 11:56:33,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (232.64) for latency MM1Queue_a033_s075
2025-05-10 11:56:33,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 11:56:33,886 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:56:33,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 2 minutes, 58 seconds)
2025-05-10 11:59:09,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:59:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 304.86472 ± 19.666
2025-05-10 11:59:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [260.90112, 309.26218, 301.04337, 291.22745, 316.64517, 298.08633, 337.43567, 296.9532, 322.5687, 314.52408]
2025-05-10 11:59:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 147.0, 149.0, 125.0, 144.0, 127.0, 148.0, 134.0, 160.0, 138.0]
2025-05-10 11:59:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (304.86) for latency MM1Queue_a033_s075
2025-05-10 11:59:11,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 11:59:11,987 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:59:11,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 1 minute, 3 seconds)
2025-05-10 12:01:49,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:01:50,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 217.63969 ± 57.321
2025-05-10 12:01:50,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [214.07079, 259.2579, 58.821575, 206.6458, 221.45302, 216.15923, 269.18088, 265.72534, 219.53033, 245.55235]
2025-05-10 12:01:50,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 127.0, 41.0, 127.0, 137.0, 130.0, 153.0, 125.0, 114.0, 154.0]
2025-05-10 12:01:50,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 58 minutes, 59 seconds)
2025-05-10 12:04:26,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:04:28,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 340.09683 ± 28.464
2025-05-10 12:04:28,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [327.5472, 365.85126, 333.16425, 335.6153, 275.82437, 361.89594, 356.75925, 318.18985, 343.12656, 382.99442]
2025-05-10 12:04:28,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 163.0, 150.0, 155.0, 132.0, 153.0, 154.0, 143.0, 158.0, 158.0]
2025-05-10 12:04:28,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (340.10) for latency MM1Queue_a033_s075
2025-05-10 12:04:28,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:04:28,962 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:04:28,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 56 minutes, 40 seconds)
2025-05-10 12:07:03,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:07:05,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 343.10461 ± 99.279
2025-05-10 12:07:05,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [393.72134, 324.46988, 396.24692, 211.6515, 382.21777, 413.23923, 410.34085, 378.58862, 102.79931, 417.77066]
2025-05-10 12:07:05,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 146.0, 174.0, 106.0, 158.0, 175.0, 162.0, 164.0, 66.0, 161.0]
2025-05-10 12:07:05,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (343.10) for latency MM1Queue_a033_s075
2025-05-10 12:07:05,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:07:05,234 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:07:05,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 53 minutes, 34 seconds)
2025-05-10 12:09:41,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:09:43,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 317.86237 ± 73.017
2025-05-10 12:09:43,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [234.12042, 354.1333, 310.90536, 411.99066, 292.92627, 207.39333, 415.93967, 348.83264, 223.3501, 379.03207]
2025-05-10 12:09:43,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 160.0, 155.0, 175.0, 144.0, 105.0, 171.0, 164.0, 121.0, 171.0]
2025-05-10 12:09:43,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 51 minutes, 43 seconds)
2025-05-10 12:12:21,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:12:24,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 359.09430 ± 35.940
2025-05-10 12:12:24,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [336.4795, 339.1843, 351.89734, 307.7358, 438.2208, 409.3129, 365.39642, 349.6538, 353.3285, 339.7335]
2025-05-10 12:12:24,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 155.0, 143.0, 132.0, 175.0, 166.0, 154.0, 137.0, 150.0, 139.0]
2025-05-10 12:12:24,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (359.09) for latency MM1Queue_a033_s075
2025-05-10 12:12:24,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:12:24,134 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:12:24,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 49 minutes, 43 seconds)
2025-05-10 12:15:00,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:15:03,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 376.25690 ± 98.714
2025-05-10 12:15:03,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [197.83145, 400.49863, 200.155, 392.51627, 382.13303, 483.85623, 497.6527, 426.80853, 343.48392, 437.6333]
2025-05-10 12:15:03,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 144.0, 102.0, 150.0, 143.0, 173.0, 178.0, 163.0, 140.0, 164.0]
2025-05-10 12:15:03,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (376.26) for latency MM1Queue_a033_s075
2025-05-10 12:15:03,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:15:03,140 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:15:03,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 47 minutes, 7 seconds)
2025-05-10 12:17:37,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:17:39,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 360.17087 ± 65.371
2025-05-10 12:17:39,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [411.29913, 393.72006, 200.92763, 371.91254, 379.91403, 341.24158, 327.9163, 464.07288, 376.17603, 334.5285]
2025-05-10 12:17:39,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 146.0, 96.0, 153.0, 147.0, 138.0, 130.0, 168.0, 152.0, 141.0]
2025-05-10 12:17:39,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 43 minutes, 54 seconds)
2025-05-10 12:20:16,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:20:18,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 385.73883 ± 89.551
2025-05-10 12:20:18,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [221.50798, 403.00397, 449.17902, 486.4384, 457.04358, 330.37112, 423.95877, 425.22958, 226.7702, 433.88562]
2025-05-10 12:20:18,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 150.0, 172.0, 164.0, 165.0, 143.0, 158.0, 154.0, 112.0, 169.0]
2025-05-10 12:20:18,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (385.74) for latency MM1Queue_a033_s075
2025-05-10 12:20:18,275 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:20:18,279 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:20:18,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 42 minutes, 3 seconds)
2025-05-10 12:22:54,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:22:57,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 368.79962 ± 83.626
2025-05-10 12:22:57,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [372.16095, 445.27155, 416.88654, 419.4582, 332.2995, 381.80054, 377.1067, 388.91483, 419.14645, 134.95074]
2025-05-10 12:22:57,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 166.0, 156.0, 161.0, 144.0, 156.0, 136.0, 144.0, 161.0, 83.0]
2025-05-10 12:22:57,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 39 minutes, 27 seconds)
2025-05-10 12:25:34,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:25:36,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 421.59064 ± 40.466
2025-05-10 12:25:36,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [397.6255, 402.20215, 418.4346, 410.82495, 459.8682, 393.8484, 422.77054, 527.2203, 398.6633, 384.44858]
2025-05-10 12:25:36,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 150.0, 156.0, 154.0, 169.0, 146.0, 156.0, 185.0, 155.0, 148.0]
2025-05-10 12:25:36,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (421.59) for latency MM1Queue_a033_s075
2025-05-10 12:25:36,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:25:36,640 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:25:36,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 36 minutes, 37 seconds)
2025-05-10 12:28:12,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:28:14,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 434.81845 ± 35.058
2025-05-10 12:28:14,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [474.01593, 426.1263, 398.52057, 431.84473, 449.3792, 419.33093, 443.89548, 425.047, 505.84717, 374.1772]
2025-05-10 12:28:14,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 155.0, 145.0, 153.0, 163.0, 156.0, 163.0, 164.0, 180.0, 138.0]
2025-05-10 12:28:14,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (434.82) for latency MM1Queue_a033_s075
2025-05-10 12:28:14,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:28:14,440 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:28:14,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 33 minutes, 39 seconds)
2025-05-10 12:30:50,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:30:52,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 466.88177 ± 44.570
2025-05-10 12:30:52,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [393.31403, 474.49118, 453.69955, 538.1149, 498.65338, 467.17215, 492.24478, 511.40945, 443.19714, 396.52148]
2025-05-10 12:30:52,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 164.0, 166.0, 186.0, 175.0, 167.0, 172.0, 172.0, 155.0, 164.0]
2025-05-10 12:30:52,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (466.88) for latency MM1Queue_a033_s075
2025-05-10 12:30:52,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:30:52,505 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:30:52,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 31 minutes, 32 seconds)
2025-05-10 12:33:29,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:33:32,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 472.57697 ± 153.518
2025-05-10 12:33:32,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [522.84485, 423.43344, 592.0262, 598.7301, 605.4156, 621.6507, 433.11932, 409.6079, 80.11991, 438.82147]
2025-05-10 12:33:32,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 165.0, 189.0, 199.0, 210.0, 195.0, 158.0, 156.0, 58.0, 159.0]
2025-05-10 12:33:32,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (472.58) for latency MM1Queue_a033_s075
2025-05-10 12:33:32,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:33:32,446 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:33:32,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 29 minutes, 7 seconds)
2025-05-10 12:36:07,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:36:09,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 505.39532 ± 38.995
2025-05-10 12:36:09,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [501.06046, 540.6505, 417.77118, 542.42114, 489.5646, 486.7431, 516.79785, 548.7838, 471.28833, 538.8723]
2025-05-10 12:36:09,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 178.0, 160.0, 196.0, 188.0, 179.0, 178.0, 189.0, 161.0, 184.0]
2025-05-10 12:36:09,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (505.40) for latency MM1Queue_a033_s075
2025-05-10 12:36:09,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:36:09,882 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:36:09,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 26 minutes, 7 seconds)
2025-05-10 12:38:48,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:38:51,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 479.51099 ± 80.510
2025-05-10 12:38:51,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [429.89572, 343.1827, 541.42944, 621.9421, 594.74994, 473.70773, 458.2718, 406.83774, 458.73212, 466.36078]
2025-05-10 12:38:51,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 147.0, 181.0, 193.0, 192.0, 172.0, 168.0, 155.0, 169.0, 167.0]
2025-05-10 12:38:51,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 23 minutes, 54 seconds)
2025-05-10 12:41:24,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:41:27,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 592.22607 ± 55.581
2025-05-10 12:41:27,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [609.56305, 609.40216, 495.6463, 618.07605, 633.60675, 480.29324, 597.43915, 641.6413, 651.6968, 584.89606]
2025-05-10 12:41:27,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 199.0, 174.0, 199.0, 201.0, 164.0, 202.0, 225.0, 211.0, 197.0]
2025-05-10 12:41:27,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (592.23) for latency MM1Queue_a033_s075
2025-05-10 12:41:27,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 12:41:27,920 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:41:27,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 21 minutes)
2025-05-10 12:44:06,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:44:09,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 536.42462 ± 65.966
2025-05-10 12:44:09,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [576.8593, 486.8055, 500.19226, 556.77106, 469.77252, 489.926, 673.4591, 612.39185, 455.786, 542.28217]
2025-05-10 12:44:09,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 180.0, 175.0, 183.0, 178.0, 175.0, 215.0, 196.0, 169.0, 202.0]
2025-05-10 12:44:09,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 19 minutes, 13 seconds)
2025-05-10 12:46:44,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:46:47,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 524.07849 ± 48.342
2025-05-10 12:46:47,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [475.22015, 469.60803, 522.4404, 473.24817, 635.28925, 551.82965, 517.8637, 555.9498, 544.13873, 495.19714]
2025-05-10 12:46:47,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 164.0, 178.0, 181.0, 198.0, 177.0, 184.0, 187.0, 179.0, 181.0]
2025-05-10 12:46:47,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 16 minutes, 8 seconds)
2025-05-10 12:49:24,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:49:27,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 455.66681 ± 135.642
2025-05-10 12:49:27,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [587.4413, 447.54684, 208.47743, 496.04706, 245.15036, 379.70724, 600.4964, 624.80176, 519.5948, 447.40485]
2025-05-10 12:49:27,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [220.0, 164.0, 103.0, 175.0, 116.0, 162.0, 202.0, 216.0, 184.0, 171.0]
2025-05-10 12:49:27,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 13 minutes, 59 seconds)
2025-05-10 12:52:03,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:52:06,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 578.09485 ± 70.683
2025-05-10 12:52:06,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [657.0222, 512.1645, 532.5139, 613.46173, 557.3802, 426.4804, 624.217, 628.9677, 562.92505, 665.8159]
2025-05-10 12:52:06,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 181.0, 179.0, 214.0, 180.0, 162.0, 217.0, 215.0, 186.0, 234.0]
2025-05-10 12:52:06,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 10 minutes, 57 seconds)
2025-05-10 12:54:42,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:54:45,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 558.85394 ± 50.437
2025-05-10 12:54:45,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [540.54584, 630.8676, 605.4992, 509.61096, 586.8429, 540.1314, 501.46698, 574.51465, 621.8842, 477.17557]
2025-05-10 12:54:45,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 221.0, 198.0, 177.0, 199.0, 189.0, 178.0, 198.0, 207.0, 164.0]
2025-05-10 12:54:45,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 8 minutes, 50 seconds)
2025-05-10 12:57:20,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:57:23,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 517.51886 ± 90.323
2025-05-10 12:57:23,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [574.98114, 473.01138, 524.2582, 270.6687, 518.64703, 537.54297, 606.06506, 582.03735, 571.4932, 516.48334]
2025-05-10 12:57:23,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 173.0, 188.0, 119.0, 191.0, 182.0, 206.0, 198.0, 190.0, 175.0]
2025-05-10 12:57:23,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 5 minutes, 18 seconds)
2025-05-10 13:00:02,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:00:04,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 563.69409 ± 43.611
2025-05-10 13:00:04,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [607.54315, 639.13446, 494.47873, 524.1103, 530.2327, 526.97437, 585.0711, 594.9994, 543.55334, 590.84357]
2025-05-10 13:00:04,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 213.0, 179.0, 185.0, 181.0, 186.0, 190.0, 199.0, 194.0, 190.0]
2025-05-10 13:00:04,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 3 minutes, 23 seconds)
2025-05-10 13:02:41,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:02:44,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 540.26624 ± 78.187
2025-05-10 13:02:44,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [615.7327, 534.62396, 568.0964, 425.49936, 642.6507, 513.4519, 588.3962, 516.24115, 389.5436, 608.4267]
2025-05-10 13:02:44,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 193.0, 183.0, 153.0, 219.0, 170.0, 193.0, 178.0, 152.0, 197.0]
2025-05-10 13:02:44,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 44 seconds)
2025-05-10 13:05:19,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:05:22,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 578.53912 ± 53.202
2025-05-10 13:05:22,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [474.82883, 631.75165, 549.2621, 625.0923, 648.5964, 603.85266, 588.73676, 511.83514, 603.7741, 547.6613]
2025-05-10 13:05:22,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 203.0, 182.0, 201.0, 212.0, 201.0, 193.0, 177.0, 208.0, 189.0]
2025-05-10 13:05:22,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 57 minutes, 41 seconds)
2025-05-10 13:08:00,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:08:02,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 469.00211 ± 71.647
2025-05-10 13:08:02,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [550.9784, 395.16776, 405.23828, 450.3648, 567.3946, 410.47678, 560.8442, 400.16342, 539.5177, 409.87515]
2025-05-10 13:08:02,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 180.0, 170.0, 180.0, 198.0, 167.0, 204.0, 162.0, 196.0, 171.0]
2025-05-10 13:08:02,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 55 minutes, 21 seconds)
2025-05-10 13:10:38,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:10:41,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 606.19458 ± 26.980
2025-05-10 13:10:41,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [622.28186, 657.599, 635.33203, 594.4907, 612.7535, 584.8119, 617.16724, 594.44977, 582.26227, 560.7979]
2025-05-10 13:10:41,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 208.0, 202.0, 191.0, 192.0, 190.0, 203.0, 206.0, 190.0, 185.0]
2025-05-10 13:10:41,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (606.19) for latency MM1Queue_a033_s075
2025-05-10 13:10:41,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 13:10:41,118 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 13:10:41,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-05-10 13:13:20,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:13:23,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 606.49261 ± 43.960
2025-05-10 13:13:23,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [590.0346, 557.8826, 620.1847, 510.01404, 658.82416, 631.15643, 627.001, 622.3715, 659.6548, 587.8026]
2025-05-10 13:13:23,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 200.0, 204.0, 212.0, 220.0, 213.0, 206.0, 215.0, 220.0, 202.0]
2025-05-10 13:13:23,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (606.49) for latency MM1Queue_a033_s075
2025-05-10 13:13:23,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 13:13:23,160 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 13:13:23,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 50 minutes, 16 seconds)
2025-05-10 13:15:57,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:16:00,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 602.30286 ± 29.597
2025-05-10 13:16:00,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [628.73016, 606.18896, 550.9416, 628.6327, 619.72736, 641.5537, 554.2399, 613.48975, 581.9764, 597.54834]
2025-05-10 13:16:00,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 193.0, 188.0, 206.0, 205.0, 206.0, 184.0, 214.0, 187.0, 205.0]
2025-05-10 13:16:00,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 47 minutes, 15 seconds)
2025-05-10 13:18:36,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:18:39,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 571.48016 ± 67.314
2025-05-10 13:18:39,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [632.6455, 590.1371, 581.65454, 583.2312, 385.00333, 528.76984, 614.15204, 594.8914, 594.045, 610.2714]
2025-05-10 13:18:39,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 202.0, 190.0, 194.0, 151.0, 202.0, 205.0, 198.0, 194.0, 202.0]
2025-05-10 13:18:39,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 44 minutes, 46 seconds)
2025-05-10 13:21:17,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:21:19,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 572.10925 ± 50.047
2025-05-10 13:21:19,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [549.8211, 593.6735, 594.06476, 568.79596, 594.2323, 600.28015, 607.1446, 579.3608, 430.47568, 603.24414]
2025-05-10 13:21:19,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 199.0, 203.0, 193.0, 202.0, 189.0, 201.0, 190.0, 160.0, 194.0]
2025-05-10 13:21:19,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 42 minutes, 3 seconds)
2025-05-10 13:23:56,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:23:59,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 594.08429 ± 12.807
2025-05-10 13:23:59,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [576.9381, 614.4061, 607.10846, 588.8025, 573.1272, 598.603, 598.13104, 605.7552, 594.9603, 583.0107]
2025-05-10 13:23:59,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 201.0, 192.0, 192.0, 188.0, 189.0, 194.0, 199.0, 197.0, 185.0]
2025-05-10 13:23:59,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 39 minutes, 41 seconds)
2025-05-10 13:26:37,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:26:40,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 596.31158 ± 19.077
2025-05-10 13:26:40,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [637.78314, 603.7573, 594.63947, 592.11554, 605.46234, 574.39435, 580.86163, 614.5736, 570.83057, 588.69775]
2025-05-10 13:26:40,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 196.0, 201.0, 192.0, 195.0, 185.0, 195.0, 187.0, 189.0, 187.0]
2025-05-10 13:26:40,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 36 minutes, 50 seconds)
2025-05-10 13:29:16,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:29:19,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 566.77576 ± 35.088
2025-05-10 13:29:19,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [617.21655, 555.61, 538.62805, 521.61896, 574.7102, 575.35065, 563.7416, 521.6456, 634.7737, 564.4627]
2025-05-10 13:29:19,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 179.0, 184.0, 175.0, 189.0, 183.0, 186.0, 170.0, 198.0, 185.0]
2025-05-10 13:29:19,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 34 minutes, 23 seconds)
2025-05-10 13:31:54,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:31:57,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 596.25305 ± 16.090
2025-05-10 13:31:57,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [613.0781, 566.0857, 617.9789, 586.9974, 590.6936, 580.7466, 594.77527, 610.0558, 588.2548, 613.8649]
2025-05-10 13:31:57,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 188.0, 190.0, 185.0, 183.0, 195.0, 199.0, 190.0, 193.0, 192.0]
2025-05-10 13:31:57,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 31 minutes, 33 seconds)
2025-05-10 13:34:36,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:34:38,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 509.84448 ± 147.475
2025-05-10 13:34:38,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [586.8303, 597.29517, 100.446976, 559.47754, 580.5474, 570.3206, 397.2346, 602.7209, 565.26105, 538.30994]
2025-05-10 13:34:38,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 196.0, 62.0, 186.0, 194.0, 190.0, 154.0, 191.0, 189.0, 184.0]
2025-05-10 13:34:38,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 29 minutes, 8 seconds)
2025-05-10 13:37:14,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:37:16,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 568.34021 ± 22.081
2025-05-10 13:37:16,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [590.0486, 555.367, 569.20135, 594.19025, 574.61145, 540.9592, 578.7691, 521.5038, 590.15186, 568.59906]
2025-05-10 13:37:16,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 186.0, 180.0, 194.0, 194.0, 178.0, 189.0, 171.0, 198.0, 181.0]
2025-05-10 13:37:16,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 26 minutes, 11 seconds)
2025-05-10 13:39:54,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:39:57,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 584.38379 ± 33.351
2025-05-10 13:39:57,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [548.1536, 524.5103, 616.5839, 582.1729, 601.22614, 614.50684, 618.2592, 583.9051, 614.2941, 540.2255]
2025-05-10 13:39:57,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 169.0, 198.0, 193.0, 189.0, 193.0, 188.0, 190.0, 202.0, 174.0]
2025-05-10 13:39:57,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 23 minutes, 25 seconds)
2025-05-10 13:42:32,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:42:35,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 601.66852 ± 31.123
2025-05-10 13:42:35,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [630.964, 664.2262, 619.16565, 544.20966, 587.94354, 599.5224, 575.32904, 582.3486, 607.05475, 605.92126]
2025-05-10 13:42:35,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 205.0, 196.0, 181.0, 197.0, 190.0, 187.0, 194.0, 192.0, 197.0]
2025-05-10 13:42:35,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 20 minutes, 36 seconds)
2025-05-10 13:45:11,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:45:14,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 578.93048 ± 19.729
2025-05-10 13:45:14,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [596.0595, 591.2232, 546.88904, 580.7111, 554.06305, 567.15454, 574.4329, 619.04315, 583.7271, 576.0012]
2025-05-10 13:45:14,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 191.0, 192.0, 200.0, 177.0, 193.0, 189.0, 204.0, 197.0, 190.0]
2025-05-10 13:45:14,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 18 minutes, 13 seconds)
2025-05-10 13:47:52,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:47:55,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 548.18884 ± 115.119
2025-05-10 13:47:55,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [612.45245, 528.9427, 573.3713, 603.64465, 556.5868, 591.9248, 213.40613, 591.1812, 574.3355, 636.0436]
2025-05-10 13:47:55,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 179.0, 188.0, 190.0, 189.0, 192.0, 100.0, 184.0, 178.0, 205.0]
2025-05-10 13:47:55,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 15 minutes, 25 seconds)
2025-05-10 13:50:33,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:50:35,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 579.86963 ± 13.239
2025-05-10 13:50:35,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [581.2093, 566.8268, 595.5752, 566.3513, 567.8542, 604.8601, 590.2109, 577.22974, 564.1307, 584.44775]
2025-05-10 13:50:35,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [184.0, 190.0, 202.0, 189.0, 191.0, 207.0, 194.0, 192.0, 187.0, 196.0]
2025-05-10 13:50:35,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 13 minutes, 9 seconds)
2025-05-10 13:53:10,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:53:12,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 590.53009 ± 26.832
2025-05-10 13:53:12,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [583.03253, 617.9164, 598.8394, 583.4741, 535.7166, 626.43097, 561.0544, 578.11926, 601.07996, 619.6366]
2025-05-10 13:53:12,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [187.0, 198.0, 189.0, 180.0, 175.0, 189.0, 186.0, 184.0, 202.0, 188.0]
2025-05-10 13:53:12,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 9 minutes, 55 seconds)
2025-05-10 13:55:52,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:55:55,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 566.59808 ± 25.206
2025-05-10 13:55:55,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [561.73065, 522.9504, 594.2345, 575.422, 592.0441, 570.6165, 590.36865, 519.01715, 575.71906, 563.878]
2025-05-10 13:55:55,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 178.0, 201.0, 191.0, 191.0, 189.0, 193.0, 172.0, 196.0, 190.0]
2025-05-10 13:55:55,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 7 minutes, 58 seconds)
2025-05-10 13:58:29,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:58:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 565.61371 ± 17.040
2025-05-10 13:58:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [583.09894, 567.40326, 578.01807, 527.97845, 558.1429, 579.7489, 551.7736, 566.9411, 587.4209, 555.6107]
2025-05-10 13:58:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 193.0, 191.0, 179.0, 199.0, 190.0, 184.0, 187.0, 194.0, 188.0]
2025-05-10 13:58:32,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 5 minutes)
2025-05-10 14:01:10,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:01:12,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 534.08447 ± 23.340
2025-05-10 14:01:12,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [512.4406, 521.7808, 510.56998, 546.1646, 548.0231, 549.57227, 543.1515, 529.391, 497.90228, 581.84863]
2025-05-10 14:01:12,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [187.0, 186.0, 183.0, 189.0, 188.0, 184.0, 184.0, 186.0, 179.0, 190.0]
2025-05-10 14:01:12,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 2 minutes, 16 seconds)
2025-05-10 14:03:50,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:03:53,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 594.98962 ± 16.900
2025-05-10 14:03:53,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [610.32733, 568.6899, 571.28546, 612.0941, 581.17694, 610.12134, 595.1175, 593.17596, 619.6211, 588.28674]
2025-05-10 14:03:53,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 183.0, 194.0, 201.0, 187.0, 201.0, 199.0, 197.0, 207.0, 195.0]
2025-05-10 14:03:53,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 59 minutes, 33 seconds)
2025-05-10 14:06:31,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:06:33,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 584.11517 ± 13.701
2025-05-10 14:06:33,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [599.2931, 598.98395, 581.4769, 583.33307, 563.24915, 566.6787, 589.9255, 593.73175, 598.75507, 565.7249]
2025-05-10 14:06:33,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 202.0, 199.0, 192.0, 185.0, 180.0, 196.0, 187.0, 187.0, 186.0]
2025-05-10 14:06:33,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 57 minutes, 28 seconds)
2025-05-10 14:09:09,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:09:11,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 556.44946 ± 66.636
2025-05-10 14:09:11,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [592.8009, 568.1449, 584.0471, 364.23483, 611.5463, 558.03076, 551.199, 575.85864, 598.6765, 559.95557]
2025-05-10 14:09:11,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 190.0, 191.0, 145.0, 212.0, 190.0, 190.0, 189.0, 206.0, 184.0]
2025-05-10 14:09:11,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 54 minutes, 11 seconds)
2025-05-10 14:11:47,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:11:50,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 520.20477 ± 147.731
2025-05-10 14:11:50,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [571.449, 530.53845, 576.2802, 555.8932, 572.0027, 550.4587, 602.35333, 571.4311, 80.74572, 590.89606]
2025-05-10 14:11:50,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 201.0, 201.0, 183.0, 199.0, 197.0, 213.0, 194.0, 54.0, 207.0]
2025-05-10 14:11:50,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 51 minutes, 43 seconds)
2025-05-10 14:14:30,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:14:33,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 596.08057 ± 19.161
2025-05-10 14:14:33,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [596.84235, 573.9575, 624.27625, 598.77454, 621.79675, 606.18787, 572.61273, 574.9487, 613.2832, 578.1259]
2025-05-10 14:14:33,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 201.0, 201.0, 185.0, 212.0, 211.0, 189.0, 203.0, 201.0, 194.0]
2025-05-10 14:14:33,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 49 minutes, 20 seconds)
2025-05-10 14:17:09,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:17:12,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 529.06628 ± 149.369
2025-05-10 14:17:12,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [590.6037, 83.07272, 606.96924, 582.87286, 554.30786, 573.1985, 586.14886, 584.5639, 561.27246, 567.6523]
2025-05-10 14:17:12,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 56.0, 204.0, 196.0, 185.0, 193.0, 189.0, 197.0, 183.0, 190.0]
2025-05-10 14:17:12,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 46 minutes, 33 seconds)
2025-05-10 14:19:47,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:19:50,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 572.95447 ± 33.748
2025-05-10 14:19:50,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [582.66614, 589.5733, 578.02, 533.07623, 636.6181, 523.73083, 605.9182, 542.59125, 547.6807, 589.6699]
2025-05-10 14:19:50,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 200.0, 208.0, 192.0, 227.0, 198.0, 201.0, 189.0, 203.0, 200.0]
2025-05-10 14:19:50,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 43 minutes, 33 seconds)
2025-05-10 14:22:27,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:22:30,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 507.59277 ± 136.712
2025-05-10 14:22:30,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [606.9881, 356.21588, 607.44336, 587.3526, 562.50836, 628.7196, 375.33636, 201.74304, 572.0686, 577.552]
2025-05-10 14:22:30,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 150.0, 196.0, 195.0, 190.0, 207.0, 154.0, 97.0, 198.0, 187.0]
2025-05-10 14:22:30,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 41 minutes, 5 seconds)
2025-05-10 14:25:07,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:25:10,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 573.83246 ± 21.353
2025-05-10 14:25:10,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [557.1334, 615.4536, 563.2925, 573.85205, 566.0657, 534.4035, 578.95746, 586.4983, 597.5038, 565.16473]
2025-05-10 14:25:10,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [187.0, 196.0, 190.0, 193.0, 197.0, 186.0, 190.0, 200.0, 203.0, 189.0]
2025-05-10 14:25:10,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 38 minutes, 35 seconds)
2025-05-10 14:27:48,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:27:51,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 541.76184 ± 69.462
2025-05-10 14:27:51,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [580.2565, 579.16113, 587.439, 583.2374, 579.0034, 555.70465, 576.77686, 426.07828, 565.6789, 384.28232]
2025-05-10 14:27:51,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 196.0, 197.0, 201.0, 199.0, 194.0, 188.0, 158.0, 191.0, 157.0]
2025-05-10 14:27:51,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 35 minutes, 47 seconds)
2025-05-10 14:30:28,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:30:31,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 600.00665 ± 36.811
2025-05-10 14:30:31,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [612.2268, 607.0685, 594.23236, 643.59796, 630.2269, 502.51193, 627.35974, 596.22876, 601.8783, 584.73474]
2025-05-10 14:30:31,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [190.0, 205.0, 204.0, 207.0, 210.0, 183.0, 201.0, 210.0, 204.0, 189.0]
2025-05-10 14:30:31,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 33 minutes, 12 seconds)
2025-05-10 14:33:07,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:33:10,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 576.22418 ± 83.924
2025-05-10 14:33:10,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [623.4374, 397.45126, 592.46576, 445.65692, 582.3628, 591.7179, 610.08527, 589.41583, 693.397, 636.25183]
2025-05-10 14:33:10,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [206.0, 158.0, 198.0, 164.0, 197.0, 189.0, 195.0, 192.0, 224.0, 195.0]
2025-05-10 14:33:10,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 30 minutes, 41 seconds)
2025-05-10 14:35:47,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:35:50,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 596.73059 ± 26.530
2025-05-10 14:35:50,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [595.1135, 547.07294, 618.1784, 580.7345, 599.37787, 581.0674, 608.14154, 638.36096, 570.2899, 628.9686]
2025-05-10 14:35:50,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 176.0, 201.0, 196.0, 204.0, 191.0, 201.0, 214.0, 193.0, 197.0]
2025-05-10 14:35:50,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 28 minutes, 1 second)
2025-05-10 14:38:27,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:38:30,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 565.05859 ± 16.801
2025-05-10 14:38:30,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [549.49646, 591.12823, 584.46106, 533.2167, 563.7029, 560.2293, 575.91833, 578.8572, 561.2777, 552.2986]
2025-05-10 14:38:30,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [189.0, 182.0, 191.0, 179.0, 192.0, 190.0, 202.0, 196.0, 188.0, 186.0]
2025-05-10 14:38:30,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 25 minutes, 22 seconds)
2025-05-10 14:41:07,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:41:10,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 612.07172 ± 26.269
2025-05-10 14:41:10,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [602.5778, 629.56946, 611.7655, 564.28723, 674.0757, 612.138, 608.2003, 618.4402, 603.0772, 596.58655]
2025-05-10 14:41:10,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 203.0, 196.0, 186.0, 216.0, 189.0, 193.0, 200.0, 195.0, 197.0]
2025-05-10 14:41:10,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (612.07) for latency MM1Queue_a033_s075
2025-05-10 14:41:10,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 14:41:10,060 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 14:41:10,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 22 minutes, 32 seconds)
2025-05-10 14:43:46,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:43:49,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 565.88556 ± 66.090
2025-05-10 14:43:49,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [611.2904, 538.6202, 578.43896, 619.4619, 608.3663, 547.28516, 393.51892, 579.9106, 638.5638, 543.39886]
2025-05-10 14:43:49,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 184.0, 196.0, 210.0, 195.0, 188.0, 155.0, 194.0, 199.0, 197.0]
2025-05-10 14:43:49,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 19 minutes, 48 seconds)
2025-05-10 14:46:25,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:46:28,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 594.53351 ± 60.634
2025-05-10 14:46:28,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [593.24304, 634.4976, 629.1535, 611.49414, 643.705, 419.3615, 603.3184, 614.1189, 591.56616, 604.87616]
2025-05-10 14:46:28,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [189.0, 203.0, 202.0, 199.0, 205.0, 164.0, 200.0, 206.0, 207.0, 194.0]
2025-05-10 14:46:28,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 17 minutes, 7 seconds)
2025-05-10 14:49:05,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:49:08,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 611.35071 ± 49.975
2025-05-10 14:49:08,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [657.8296, 587.35956, 607.3515, 616.0368, 638.66956, 480.4537, 587.984, 655.0617, 646.8721, 635.88837]
2025-05-10 14:49:08,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [213.0, 191.0, 204.0, 195.0, 193.0, 181.0, 204.0, 214.0, 205.0, 209.0]
2025-05-10 14:49:08,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 14 minutes, 31 seconds)
2025-05-10 14:51:46,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:51:49,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 593.85321 ± 19.487
2025-05-10 14:51:49,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [588.28577, 560.83673, 608.0149, 580.32275, 574.5717, 579.18036, 626.19495, 616.57886, 601.4104, 603.13574]
2025-05-10 14:51:49,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 187.0, 199.0, 196.0, 187.0, 188.0, 202.0, 192.0, 195.0, 195.0]
2025-05-10 14:51:49,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 11 minutes, 56 seconds)
2025-05-10 14:54:28,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:54:31,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 610.32874 ± 14.957
2025-05-10 14:54:31,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [594.7235, 600.8708, 604.7592, 623.41296, 586.40857, 620.90393, 594.2801, 621.8341, 628.2853, 627.80896]
2025-05-10 14:54:31,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 196.0, 190.0, 207.0, 193.0, 195.0, 195.0, 188.0, 199.0, 198.0]
2025-05-10 14:54:31,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 9 minutes, 26 seconds)
2025-05-10 14:57:07,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:57:09,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 595.45117 ± 12.562
2025-05-10 14:57:09,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [579.6596, 585.29297, 588.52484, 577.96674, 593.45844, 611.2544, 611.0355, 595.0243, 597.59814, 614.6969]
2025-05-10 14:57:09,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 191.0, 192.0, 190.0, 191.0, 204.0, 197.0, 187.0, 196.0, 201.0]
2025-05-10 14:57:09,974 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 6 minutes, 43 seconds)
2025-05-10 14:59:45,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:59:48,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 583.41455 ± 59.360
2025-05-10 14:59:48,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [412.1792, 579.12366, 589.52374, 606.7199, 609.47687, 611.5756, 587.311, 618.395, 634.4149, 585.4257]
2025-05-10 14:59:48,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 192.0, 191.0, 195.0, 197.0, 192.0, 193.0, 205.0, 203.0, 192.0]
2025-05-10 14:59:48,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 4 minutes)
2025-05-10 15:02:27,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:02:30,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 594.25122 ± 40.900
2025-05-10 15:02:30,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [590.2464, 479.8443, 633.7105, 619.9538, 588.48047, 588.94037, 598.2358, 606.8139, 621.93506, 614.35144]
2025-05-10 15:02:30,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 163.0, 205.0, 197.0, 186.0, 190.0, 187.0, 195.0, 198.0, 196.0]
2025-05-10 15:02:30,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 1 minute, 25 seconds)
2025-05-10 15:05:05,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:05:08,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 579.50037 ± 44.003
2025-05-10 15:05:08,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [608.879, 577.94696, 593.5403, 600.2202, 556.878, 456.47775, 584.8472, 609.79834, 611.9734, 594.4417]
2025-05-10 15:05:08,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 198.0, 196.0, 191.0, 188.0, 167.0, 197.0, 196.0, 195.0, 196.0]
2025-05-10 15:05:08,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 58 minutes, 32 seconds)
2025-05-10 15:07:46,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:07:49,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 614.79767 ± 10.833
2025-05-10 15:07:49,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [622.9174, 624.203, 621.4498, 625.64246, 614.6172, 611.94214, 614.38873, 603.68463, 588.5019, 620.62885]
2025-05-10 15:07:49,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 206.0, 203.0, 197.0, 192.0, 199.0, 197.0, 191.0, 191.0, 197.0]
2025-05-10 15:07:49,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (614.80) for latency MM1Queue_a033_s075
2025-05-10 15:07:49,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 15:07:49,939 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 15:07:49,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 55 minutes, 54 seconds)
2025-05-10 15:10:27,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:10:30,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 511.18961 ± 204.827
2025-05-10 15:10:30,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [583.0429, 624.9385, 597.5292, 104.95512, 624.5809, 626.7612, 622.0815, 597.8831, 629.84985, 100.273895]
2025-05-10 15:10:30,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 195.0, 188.0, 68.0, 204.0, 209.0, 198.0, 196.0, 200.0, 63.0]
2025-05-10 15:10:30,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 53 minutes, 21 seconds)
2025-05-10 15:13:04,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:13:07,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 601.70551 ± 26.759
2025-05-10 15:13:07,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [615.1969, 619.24146, 607.19916, 529.4464, 601.94006, 603.4506, 633.59283, 588.5061, 616.7194, 601.7626]
2025-05-10 15:13:07,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 195.0, 202.0, 180.0, 195.0, 192.0, 206.0, 192.0, 197.0, 195.0]
2025-05-10 15:13:07,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 50 minutes, 36 seconds)
2025-05-10 15:15:45,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:15:48,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 599.67236 ± 48.823
2025-05-10 15:15:48,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [598.6845, 624.6921, 629.9588, 646.2866, 463.26932, 631.2997, 600.8389, 600.6892, 616.4742, 584.5304]
2025-05-10 15:15:48,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 199.0, 190.0, 199.0, 170.0, 197.0, 197.0, 186.0, 200.0, 192.0]
2025-05-10 15:15:48,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 47 minutes, 54 seconds)
2025-05-10 15:18:26,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:18:29,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 612.58105 ± 13.445
2025-05-10 15:18:29,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [591.35504, 601.46967, 623.15265, 639.9957, 601.84656, 611.2677, 609.6392, 618.49695, 604.1955, 624.3914]
2025-05-10 15:18:29,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 192.0, 202.0, 204.0, 192.0, 195.0, 192.0, 193.0, 190.0, 200.0]
2025-05-10 15:18:29,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 45 minutes, 23 seconds)
2025-05-10 15:21:06,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:21:09,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 624.45398 ± 26.242
2025-05-10 15:21:09,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [634.70306, 592.9263, 603.3702, 670.2948, 611.5253, 642.7491, 662.1779, 598.2784, 628.8071, 599.70746]
2025-05-10 15:21:09,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 195.0, 192.0, 201.0, 193.0, 208.0, 215.0, 197.0, 201.0, 190.0]
2025-05-10 15:21:09,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (624.45) for latency MM1Queue_a033_s075
2025-05-10 15:21:09,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 15:21:09,822 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 15:21:09,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 42 minutes, 39 seconds)
2025-05-10 15:23:45,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:23:48,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 625.09094 ± 23.923
2025-05-10 15:23:48,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [612.8011, 684.33636, 632.94226, 597.68005, 603.39966, 626.68933, 631.94, 636.17804, 624.4635, 600.47955]
2025-05-10 15:23:48,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 221.0, 208.0, 191.0, 199.0, 194.0, 203.0, 197.0, 201.0, 194.0]
2025-05-10 15:23:48,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (625.09) for latency MM1Queue_a033_s075
2025-05-10 15:23:48,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 15:23:48,934 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 15:23:48,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes, 55 seconds)
2025-05-10 15:26:24,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:26:27,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 598.86206 ± 17.729
2025-05-10 15:26:27,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [617.4382, 617.3586, 584.29114, 628.38794, 605.03156, 578.42017, 575.3601, 584.6872, 589.8025, 607.84326]
2025-05-10 15:26:27,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 192.0, 189.0, 197.0, 196.0, 194.0, 189.0, 199.0, 187.0, 194.0]
2025-05-10 15:26:27,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 37 minutes, 18 seconds)
2025-05-10 15:29:04,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:29:07,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 599.48358 ± 15.672
2025-05-10 15:29:07,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [606.7279, 602.54626, 603.5333, 609.8634, 596.24615, 584.06744, 617.0037, 622.27856, 584.003, 568.56635]
2025-05-10 15:29:07,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 193.0, 197.0, 197.0, 198.0, 188.0, 197.0, 196.0, 190.0, 184.0]
2025-05-10 15:29:07,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 34 minutes, 36 seconds)
2025-05-10 15:31:44,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:31:47,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 608.92633 ± 30.217
2025-05-10 15:31:47,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [617.62634, 630.07666, 624.379, 579.7849, 542.1607, 644.9215, 621.4167, 634.12463, 578.554, 616.21857]
2025-05-10 15:31:47,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 201.0, 195.0, 186.0, 178.0, 201.0, 201.0, 198.0, 190.0, 202.0]
2025-05-10 15:31:47,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 55 seconds)
2025-05-10 15:34:26,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:34:28,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 612.47925 ± 14.880
2025-05-10 15:34:28,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [625.2123, 603.9385, 607.98065, 598.75055, 623.9731, 626.7048, 636.7795, 587.1347, 615.7136, 598.605]
2025-05-10 15:34:28,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 189.0, 196.0, 195.0, 196.0, 202.0, 204.0, 189.0, 202.0, 197.0]
2025-05-10 15:34:28,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 29 minutes, 18 seconds)
2025-05-10 15:37:05,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:37:08,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 617.14526 ± 26.540
2025-05-10 15:37:08,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [617.80316, 615.9095, 637.57495, 609.2607, 622.3047, 618.10754, 656.5985, 648.26746, 566.5369, 579.08875]
2025-05-10 15:37:08,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 200.0, 205.0, 198.0, 202.0, 197.0, 204.0, 209.0, 186.0, 185.0]
2025-05-10 15:37:08,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 39 seconds)
2025-05-10 15:39:46,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:39:48,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 609.51288 ± 39.083
2025-05-10 15:39:48,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [614.07043, 502.6164, 624.1718, 614.7938, 665.24506, 622.2802, 605.7477, 606.37006, 626.0239, 613.8095]
2025-05-10 15:39:48,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 169.0, 197.0, 201.0, 203.0, 201.0, 195.0, 193.0, 195.0, 203.0]
2025-05-10 15:39:48,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 24 minutes, 2 seconds)
2025-05-10 15:42:25,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:42:28,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 613.36548 ± 22.856
2025-05-10 15:42:28,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [617.4183, 642.7356, 624.31274, 588.4273, 649.2692, 606.6386, 634.2712, 601.9515, 586.9336, 581.6968]
2025-05-10 15:42:28,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 204.0, 194.0, 190.0, 203.0, 190.0, 199.0, 195.0, 189.0, 189.0]
2025-05-10 15:42:28,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 21 minutes, 21 seconds)
2025-05-10 15:45:04,893 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:45:07,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 478.56073 ± 204.207
2025-05-10 15:45:07,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [594.15234, 217.27332, 87.859726, 622.17413, 593.13464, 641.9762, 615.9291, 208.71837, 604.88367, 599.50604]
2025-05-10 15:45:07,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 104.0, 57.0, 207.0, 203.0, 208.0, 200.0, 95.0, 197.0, 196.0]
2025-05-10 15:45:07,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 40 seconds)
2025-05-10 15:47:47,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:47:50,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 627.58612 ± 24.617
2025-05-10 15:47:50,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [688.68616, 614.11273, 626.81415, 611.63666, 653.8831, 607.48096, 636.62317, 613.26825, 610.75543, 612.6011]
2025-05-10 15:47:50,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 197.0, 202.0, 188.0, 209.0, 193.0, 198.0, 199.0, 199.0, 202.0]
2025-05-10 15:47:50,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (627.59) for latency MM1Queue_a033_s075
2025-05-10 15:47:50,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 15:47:50,114 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 15:47:50,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 1 second)
2025-05-10 15:50:25,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:50:28,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 551.46271 ± 152.449
2025-05-10 15:50:28,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [577.7223, 630.8407, 633.63617, 590.8049, 571.9486, 97.96275, 620.10425, 602.32153, 596.5116, 592.77454]
2025-05-10 15:50:28,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 199.0, 204.0, 187.0, 190.0, 61.0, 194.0, 193.0, 192.0, 191.0]
2025-05-10 15:50:28,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 19 seconds)
2025-05-10 15:53:04,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:53:07,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 614.84235 ± 18.063
2025-05-10 15:53:07,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [623.7403, 586.0237, 636.4525, 638.56305, 635.69, 606.8971, 595.1225, 594.56195, 619.1486, 612.2241]
2025-05-10 15:53:07,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 190.0, 199.0, 200.0, 200.0, 196.0, 186.0, 197.0, 199.0, 202.0]
2025-05-10 15:53:07,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 38 seconds)
2025-05-10 15:55:45,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:55:48,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 611.95538 ± 8.989
2025-05-10 15:55:48,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [624.8844, 601.29877, 619.104, 614.11084, 605.2595, 617.21075, 599.8115, 604.173, 608.39215, 625.309]
2025-05-10 15:55:48,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 190.0, 196.0, 194.0, 192.0, 193.0, 189.0, 190.0, 186.0, 202.0]
2025-05-10 15:55:48,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes)
2025-05-10 15:58:28,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:58:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 596.52917 ± 22.235
2025-05-10 15:58:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [621.5429, 586.2951, 595.87616, 547.72986, 592.7858, 623.8666, 572.0205, 613.45526, 603.74274, 607.9773]
2025-05-10 15:58:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 198.0, 203.0, 195.0, 186.0, 207.0, 197.0, 198.0, 202.0, 196.0]
2025-05-10 15:58:31,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 21 seconds)
2025-05-10 16:01:07,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:01:10,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 585.13086 ± 19.277
2025-05-10 16:01:10,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [604.008, 578.59357, 582.56116, 601.7468, 568.9243, 540.2206, 583.7807, 609.0294, 583.887, 598.5567]
2025-05-10 16:01:10,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 187.0, 191.0, 190.0, 183.0, 174.0, 188.0, 196.0, 183.0, 199.0]
2025-05-10 16:01:10,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 40 seconds)
2025-05-10 16:03:47,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:03:50,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 612.94904 ± 20.220
2025-05-10 16:03:50,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [614.4548, 611.3841, 618.0313, 640.6293, 611.95465, 650.87396, 597.5084, 612.058, 596.11285, 576.4829]
2025-05-10 16:03:50,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 194.0, 194.0, 228.0, 191.0, 204.0, 194.0, 194.0, 197.0, 203.0]
2025-05-10 16:03:50,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1251 [DEBUG]: Training session finished
