2025-05-09 02:42:07,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac
2025-05-09 02:42:07,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac
2025-05-09 02:42:07,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7717bb63ef70>}
2025-05-09 02:42:07,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1111 [DEBUG]: using device: cpu
2025-05-09 02:42:07,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-09 02:42:07,186 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=27, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 02:42:07,186 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 02:42:07,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-09 02:42:07,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-09 02:44:35,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:44:39,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -334.80814 ± 538.176
2025-05-09 02:44:39,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-11.737384, -18.538132, -22.16979, -1344.2388, -75.254654, -67.29243, -29.923767, -1461.414, -80.89257, -236.61983]
2025-05-09 02:44:39,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 15.0, 1000.0, 77.0, 56.0, 19.0, 1000.0, 58.0, 132.0]
2025-05-09 02:44:39,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-334.81) for latency MM1Queue_a033_s075
2025-05-09 02:44:39,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 02:44:39,877 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 02:44:39,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 11 minutes, 41 seconds)
2025-05-09 02:47:27,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:47:33,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -107.92206 ± 129.582
2025-05-09 02:47:33,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-25.299183, -68.42144, -39.649624, 2.4288747, -355.929, -26.255789, -254.75285, 13.975214, -289.3504, -35.966488]
2025-05-09 02:47:33,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 117.0, 114.0, 14.0, 1000.0, 139.0, 1000.0, 86.0, 1000.0, 34.0]
2025-05-09 02:47:33,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-107.92) for latency MM1Queue_a033_s075
2025-05-09 02:47:33,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 02:47:33,246 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 02:47:33,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 26 minutes, 9 seconds)
2025-05-09 02:50:00,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:50:03,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -40.81995 ± 94.958
2025-05-09 02:50:03,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-7.155845, -33.67256, -2.9779916, -11.431874, 17.605406, -323.4273, -12.349081, -11.654378, -7.2948403, -15.841048]
2025-05-09 02:50:03,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 91.0, 58.0, 22.0, 57.0, 1000.0, 95.0, 44.0, 189.0, 44.0]
2025-05-09 02:50:03,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-40.82) for latency MM1Queue_a033_s075
2025-05-09 02:50:03,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 02:50:03,281 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 02:50:03,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 16 minutes, 28 seconds)
2025-05-09 02:52:41,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:52:47,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -72.84704 ± 79.856
2025-05-09 02:52:47,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-28.512957, -12.821308, -223.07109, -222.4774, -8.799368, -18.71139, -53.806923, -9.203154, -45.66384, -105.40299]
2025-05-09 02:52:47,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 138.0, 1000.0, 1000.0, 40.0, 64.0, 223.0, 18.0, 114.0, 212.0]
2025-05-09 02:52:47,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 15 minutes, 51 seconds)
2025-05-09 02:55:33,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:55:36,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -45.32934 ± 72.767
2025-05-09 02:55:36,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-4.979193, -256.12787, -25.43634, -22.51933, -6.409423, -54.737034, 2.817128, -16.645548, -56.95727, -12.298496]
2025-05-09 02:55:36,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 1000.0, 22.0, 73.0, 79.0, 75.0, 19.0, 149.0, 84.0, 93.0]
2025-05-09 02:55:36,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 16 minutes, 7 seconds)
2025-05-09 02:58:04,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 02:58:07,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -41.97090 ± 89.957
2025-05-09 02:58:07,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1.1091429, -121.99314, -284.33112, 8.275277, -24.77418, 4.281769, 33.2342, -0.58223575, -3.4261217, -31.502586]
2025-05-09 02:58:07,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [102.0, 279.0, 1000.0, 78.0, 51.0, 40.0, 38.0, 48.0, 47.0, 48.0]
2025-05-09 02:58:07,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 12 minutes, 55 seconds)
2025-05-09 03:00:51,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:00:53,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -29.68028 ± 72.477
2025-05-09 03:00:53,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-5.3060994, -19.532827, 3.382275, -5.2336493, 8.4217, -6.5626454, -18.451365, -245.37746, -13.861574, 5.7188935]
2025-05-09 03:00:53,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [11.0, 51.0, 33.0, 53.0, 14.0, 44.0, 42.0, 1000.0, 64.0, 52.0]
2025-05-09 03:00:53,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-29.68) for latency MM1Queue_a033_s075
2025-05-09 03:00:53,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:00:53,969 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:00:53,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 8 minutes, 13 seconds)
2025-05-09 03:03:36,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:03:43,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -176.62213 ± 161.250
2025-05-09 03:03:43,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-431.06754, -1.1741891, -69.00308, -32.20387, -59.247643, -419.6106, -396.81772, -129.56224, -95.99941, -131.53517]
2025-05-09 03:03:43,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 108.0, 48.0, 199.0, 192.0, 1000.0, 1000.0, 153.0, 116.0, 119.0]
2025-05-09 03:03:43,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 11 minutes, 34 seconds)
2025-05-09 03:06:21,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:06:24,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -68.88731 ± 123.281
2025-05-09 03:06:24,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-36.420765, -29.420027, -39.4497, -1.5618112, -15.708321, -36.262665, -43.856895, -50.834785, 0.09251792, -435.45068]
2025-05-09 03:06:24,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 83.0, 138.0, 17.0, 56.0, 44.0, 74.0, 144.0, 50.0, 1000.0]
2025-05-09 03:06:24,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 7 minutes, 57 seconds)
2025-05-09 03:08:56,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:08:58,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -32.01979 ± 76.372
2025-05-09 03:08:58,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [19.750883, -1.8255519, -0.24067277, -250.31891, -40.37521, -60.51252, -8.449544, 7.1115646, 9.008127, 5.6539474]
2025-05-09 03:08:58,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 38.0, 43.0, 292.0, 74.0, 112.0, 94.0, 22.0, 62.0, 26.0]
2025-05-09 03:08:58,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 34 seconds)
2025-05-09 03:11:35,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:11:37,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -29.65366 ± 67.288
2025-05-09 03:11:37,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [12.266253, 3.960575, -153.96388, -9.698895, -157.67908, -13.034434, -32.455894, -16.984512, 10.968233, 60.085064]
2025-05-09 03:11:37,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 177.0, 37.0, 239.0, 41.0, 240.0, 17.0, 23.0, 166.0]
2025-05-09 03:11:37,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-29.65) for latency MM1Queue_a033_s075
2025-05-09 03:11:37,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:11:37,216 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:11:37,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 20 seconds)
2025-05-09 03:14:11,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:14:17,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -88.22537 ± 128.822
2025-05-09 03:14:17,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [18.162247, -15.569611, 5.5882554, -11.057528, -4.401981, -299.2707, -260.60776, -291.9888, -15.957472, -7.150391]
2025-05-09 03:14:17,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 14.0, 28.0, 24.0, 107.0, 1000.0, 1000.0, 1000.0, 41.0, 28.0]
2025-05-09 03:14:17,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 55 minutes, 42 seconds)
2025-05-09 03:17:06,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:17:09,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -61.09162 ± 120.910
2025-05-09 03:17:09,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-26.198692, -17.844685, -15.520136, 10.782345, -126.26299, -56.39842, -403.17633, 19.788063, -4.277132, 8.191856]
2025-05-09 03:17:09,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 46.0, 65.0, 76.0, 213.0, 114.0, 1000.0, 47.0, 20.0, 34.0]
2025-05-09 03:17:09,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 53 minutes, 43 seconds)
2025-05-09 03:19:33,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:19:34,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -21.47989 ± 24.819
2025-05-09 03:19:34,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-14.464261, -32.86824, -9.635263, 3.6567686, -22.331694, -36.610027, 6.420647, 12.201013, -62.60314, -58.56474]
2025-05-09 03:19:34,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 52.0, 33.0, 18.0, 136.0, 136.0, 79.0, 35.0, 84.0, 94.0]
2025-05-09 03:19:34,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-21.48) for latency MM1Queue_a033_s075
2025-05-09 03:19:34,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:19:34,787 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:19:34,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 46 minutes, 33 seconds)
2025-05-09 03:22:19,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:22:23,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -40.54054 ± 102.973
2025-05-09 03:22:23,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [3.0376153, -7.7599044, 23.599922, 45.116425, 13.430629, -241.08388, 3.6017663, -7.3384347, -247.68088, 9.671326]
2025-05-09 03:22:23,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 27.0, 79.0, 59.0, 39.0, 1000.0, 24.0, 85.0, 1000.0, 52.0]
2025-05-09 03:22:23,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 48 minutes, 11 seconds)
2025-05-09 03:24:56,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:24:58,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -43.27640 ± 61.518
2025-05-09 03:24:58,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-42.270554, -5.663648, 9.7706, -36.98406, -170.55597, -3.644163, 3.015607, -153.76091, -17.639715, -15.03118]
2025-05-09 03:24:58,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 43.0, 33.0, 63.0, 1000.0, 92.0, 66.0, 136.0, 39.0, 75.0]
2025-05-09 03:24:58,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 44 minutes, 26 seconds)
2025-05-09 03:27:36,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:27:37,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -10.90573 ± 35.486
2025-05-09 03:27:37,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-87.5038, -16.55857, 19.05635, 63.38591, -4.9620285, -19.817968, -15.304453, -14.984712, -22.08658, -10.28144]
2025-05-09 03:27:37,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 34.0, 73.0, 83.0, 35.0, 45.0, 97.0, 48.0, 30.0, 142.0]
2025-05-09 03:27:37,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-10.91) for latency MM1Queue_a033_s075
2025-05-09 03:27:37,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:27:37,538 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:27:37,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 41 minutes, 20 seconds)
2025-05-09 03:30:30,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:30:31,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -12.31832 ± 25.286
2025-05-09 03:30:31,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [2.5563653, -36.783504, -65.62194, 9.642666, 28.237118, -10.885088, 0.8559949, -2.0711865, -20.984978, -28.128613]
2025-05-09 03:30:31,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 162.0, 168.0, 42.0, 67.0, 14.0, 7.0, 23.0, 62.0, 135.0]
2025-05-09 03:30:31,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 39 minutes, 7 seconds)
2025-05-09 03:33:05,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:33:05,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -33.86335 ± 42.500
2025-05-09 03:33:05,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [18.255789, -1.0332229, -49.10504, -121.76032, -5.244694, -83.64067, -26.234453, -57.473064, 14.625772, -27.023525]
2025-05-09 03:33:05,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 26.0, 49.0, 219.0, 27.0, 73.0, 73.0, 32.0, 24.0, 39.0]
2025-05-09 03:33:05,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 38 minutes, 59 seconds)
2025-05-09 03:35:35,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:35:36,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -39.71277 ± 39.302
2025-05-09 03:35:36,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-0.55246365, -89.42115, -17.167965, -67.44575, -108.10891, -28.815624, 3.5905545, -75.34762, -12.976422, -0.8823897]
2025-05-09 03:35:36,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 139.0, 34.0, 129.0, 199.0, 109.0, 20.0, 166.0, 108.0, 32.0]
2025-05-09 03:35:36,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 31 minutes, 32 seconds)
2025-05-09 03:38:14,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:38:17,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -27.90322 ± 37.373
2025-05-09 03:38:17,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-65.375435, 7.073521, -10.548702, 9.823794, -109.19766, 0.5200091, -15.055172, -67.08457, -23.089724, -6.0982885]
2025-05-09 03:38:17,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [197.0, 73.0, 26.0, 64.0, 1000.0, 64.0, 52.0, 112.0, 70.0, 95.0]
2025-05-09 03:38:17,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 30 minutes, 12 seconds)
2025-05-09 03:40:54,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:40:55,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -12.83411 ± 20.719
2025-05-09 03:40:55,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-3.8111987, -16.202793, -12.936573, -10.465389, -41.98632, 13.457547, 11.499278, -56.147522, -12.518512, 0.7703746]
2025-05-09 03:40:55,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 44.0, 70.0, 80.0, 49.0, 73.0, 27.0, 64.0, 35.0, 13.0]
2025-05-09 03:40:55,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 27 minutes, 30 seconds)
2025-05-09 03:43:33,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:43:36,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -41.93171 ± 66.178
2025-05-09 03:43:36,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-4.4826336, -20.510736, -58.345215, -19.679577, -6.7213135, -37.485226, 3.5707324, -21.494461, -20.032612, -234.13606]
2025-05-09 03:43:36,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 43.0, 89.0, 22.0, 22.0, 39.0, 8.0, 36.0, 27.0, 1000.0]
2025-05-09 03:43:36,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 21 minutes, 25 seconds)
2025-05-09 03:46:13,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:46:13,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -0.57547 ± 8.665
2025-05-09 03:46:13,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-15.142346, 2.1884174, -2.4937496, 6.339463, 13.392979, 0.17337477, 0.88699484, -6.9116898, -12.977012, 8.788849]
2025-05-09 03:46:13,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 66.0, 34.0, 42.0, 70.0, 13.0, 9.0, 17.0, 23.0, 39.0]
2025-05-09 03:46:13,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-0.58) for latency MM1Queue_a033_s075
2025-05-09 03:46:13,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-09 03:46:13,698 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 03:46:13,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 19 minutes, 34 seconds)
2025-05-09 03:48:54,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:48:55,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -29.76084 ± 63.792
2025-05-09 03:48:55,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-11.410617, -212.83734, 10.547163, 0.62923115, 6.9069815, -30.273235, -54.976913, -2.9854128, -3.6695008, 0.46127248]
2025-05-09 03:48:55,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 205.0, 51.0, 33.0, 12.0, 36.0, 63.0, 18.0, 16.0, 75.0]
2025-05-09 03:48:55,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 19 minutes, 40 seconds)
2025-05-09 03:51:30,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:51:32,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -27.95580 ± 64.142
2025-05-09 03:51:32,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [7.544823, -204.74707, 0.6695113, -2.3260298, -5.7241106, -22.018044, -0.16409785, -69.11485, -19.633589, 35.95549]
2025-05-09 03:51:32,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 1000.0, 12.0, 14.0, 16.0, 14.0, 22.0, 162.0, 50.0, 72.0]
2025-05-09 03:51:32,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 16 minutes, 14 seconds)
2025-05-09 03:54:20,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:54:21,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -14.42274 ± 31.051
2025-05-09 03:54:21,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-67.41728, -12.583984, -21.631191, 4.19846, 41.072544, 4.6256757, 4.722846, -22.6186, -9.838194, -64.757706]
2025-05-09 03:54:21,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 45.0, 32.0, 111.0, 33.0, 20.0, 21.0, 34.0, 18.0, 231.0]
2025-05-09 03:54:21,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 16 minutes, 10 seconds)
2025-05-09 03:56:47,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:56:48,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -13.49560 ± 37.394
2025-05-09 03:56:48,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-67.7309, -14.603611, 1.9987146, 9.545212, -96.35721, -22.614033, 14.181457, 15.403753, 25.948448, -0.7278043]
2025-05-09 03:56:48,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [150.0, 30.0, 18.0, 83.0, 128.0, 112.0, 32.0, 39.0, 63.0, 24.0]
2025-05-09 03:56:48,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 10 minutes, 9 seconds)
2025-05-09 03:59:25,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 03:59:27,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -28.08671 ± 43.572
2025-05-09 03:59:27,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-40.358143, -21.985619, -11.830698, 7.018436, 1.956681, -64.96258, 9.573362, -1.3531165, -17.713467, -141.21193]
2025-05-09 03:59:27,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [96.0, 57.0, 48.0, 38.0, 18.0, 88.0, 28.0, 31.0, 35.0, 1000.0]
2025-05-09 03:59:27,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 7 minutes, 55 seconds)
2025-05-09 04:02:02,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:02:05,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -22.26193 ± 42.699
2025-05-09 04:02:05,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [4.823813, -3.0573618, -21.368944, 13.696946, 0.48908016, -142.50499, -7.709217, -5.1632223, -41.06942, -20.756027]
2025-05-09 04:02:05,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 20.0, 19.0, 16.0, 51.0, 1000.0, 36.0, 66.0, 117.0, 22.0]
2025-05-09 04:02:05,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 4 minutes, 19 seconds)
2025-05-09 04:04:46,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:04:47,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -23.17104 ± 51.051
2025-05-09 04:04:47,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [8.244349, 8.423848, -9.054817, 2.820028, -2.4054685, -45.913044, -168.44807, -26.437717, 3.182073, -2.1216183]
2025-05-09 04:04:47,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 52.0, 29.0, 70.0, 36.0, 83.0, 203.0, 25.0, 28.0, 18.0]
2025-05-09 04:04:47,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 2 minutes, 51 seconds)
2025-05-09 04:07:22,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:07:23,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -37.04898 ± 39.779
2025-05-09 04:07:23,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [13.525916, -53.11464, -4.506617, -118.23519, 2.096945, -51.481068, -35.959682, -64.86275, -65.79671, 7.844019]
2025-05-09 04:07:23,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 91.0, 20.0, 130.0, 29.0, 91.0, 42.0, 60.0, 91.0, 22.0]
2025-05-09 04:07:23,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 57 minutes, 4 seconds)
2025-05-09 04:10:10,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:10:11,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -29.40661 ± 35.380
2025-05-09 04:10:11,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-83.419205, -4.891992, -2.917, -7.1141295, 2.2541182, -80.06994, -85.62081, -6.5378847, -13.732155, -12.01713]
2025-05-09 04:10:11,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 27.0, 12.0, 76.0, 54.0, 64.0, 117.0, 12.0, 37.0, 21.0]
2025-05-09 04:10:11,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 59 minutes, 16 seconds)
2025-05-09 04:12:44,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:12:47,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -78.13173 ± 123.300
2025-05-09 04:12:47,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-23.470789, -6.683433, -79.83869, -111.36375, -22.489252, -11.718439, -434.70468, -22.932112, -9.109515, -59.006638]
2025-05-09 04:12:47,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [142.0, 61.0, 109.0, 175.0, 42.0, 27.0, 1000.0, 41.0, 39.0, 66.0]
2025-05-09 04:12:47,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 55 minutes, 52 seconds)
2025-05-09 04:15:19,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:15:20,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -13.71302 ± 20.600
2025-05-09 04:15:20,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [13.299422, -17.613007, -9.638666, -9.58512, -38.322014, 1.3356947, -60.639454, 2.479092, -1.2106713, -17.235514]
2025-05-09 04:15:20,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 15.0, 36.0, 102.0, 34.0, 41.0, 30.0, 83.0, 70.0]
2025-05-09 04:15:20,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 52 minutes, 14 seconds)
2025-05-09 04:17:58,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:17:59,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -10.02557 ± 13.439
2025-05-09 04:17:59,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-20.16487, -30.033796, -1.1772608, -1.4209715, -7.3335314, -36.474186, 6.24917, -6.6735706, 2.0138283, -5.240497]
2025-05-09 04:17:59,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 101.0, 20.0, 14.0, 38.0, 66.0, 23.0, 16.0, 34.0, 36.0]
2025-05-09 04:17:59,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 48 minutes, 49 seconds)
2025-05-09 04:20:38,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:20:39,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -16.98140 ± 21.537
2025-05-09 04:20:39,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-14.338354, -65.08075, -5.655353, -2.7058578, -36.65439, -9.06222, 3.6898012, -35.471046, 10.652432, -15.18829]
2025-05-09 04:20:39,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 99.0, 61.0, 14.0, 57.0, 73.0, 41.0, 22.0, 25.0, 26.0]
2025-05-09 04:20:39,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 47 minutes, 10 seconds)
2025-05-09 04:23:19,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:23:23,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -22.25623 ± 54.788
2025-05-09 04:23:23,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-11.026168, -79.15072, -1.5603266, 26.287844, -8.073237, 2.785716, -163.24507, -17.538818, 29.34025, -0.38177893]
2025-05-09 04:23:23,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 82.0, 71.0, 33.0, 61.0, 44.0, 1000.0, 30.0, 1000.0, 32.0]
2025-05-09 04:23:23,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 43 minutes, 49 seconds)
2025-05-09 04:25:56,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:25:56,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -1.97346 ± 13.981
2025-05-09 04:25:56,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [16.822245, -8.374848, 1.1777517, 0.062337782, 16.430138, -6.5925326, -30.383179, 12.158078, -14.6034975, -6.431071]
2025-05-09 04:25:56,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 78.0, 108.0, 15.0, 25.0, 21.0, 62.0, 34.0, 29.0, 39.0]
2025-05-09 04:25:56,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2025-05-09 04:28:44,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:28:46,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -40.91488 ± 69.459
2025-05-09 04:28:46,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [3.9371662, -1.5922265, -13.1782465, -7.0560465, -1.1012949, -31.256071, -8.175188, -237.56737, -77.29542, -35.86412]
2025-05-09 04:28:46,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 35.0, 71.0, 37.0, 35.0, 36.0, 23.0, 1000.0, 99.0, 104.0]
2025-05-09 04:28:46,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 41 minutes, 14 seconds)
2025-05-09 04:31:28,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:31:29,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -23.36037 ± 34.686
2025-05-09 04:31:29,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [4.765086, -62.24367, -6.5411563, 4.644974, -105.77573, -8.961583, -18.481121, -1.1585674, -43.604324, 3.7524285]
2025-05-09 04:31:29,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 72.0, 27.0, 19.0, 181.0, 57.0, 40.0, 23.0, 64.0, 16.0]
2025-05-09 04:31:29,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 39 minutes, 25 seconds)
2025-05-09 04:34:04,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:34:05,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -18.20651 ± 24.228
2025-05-09 04:34:05,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-41.991688, -6.0454745, 8.784455, -0.16365261, -2.970156, -4.593877, -78.14502, -24.294727, -21.216063, -11.428913]
2025-05-09 04:34:05,154 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [49.0, 13.0, 24.0, 19.0, 45.0, 118.0, 147.0, 67.0, 64.0, 23.0]
2025-05-09 04:34:05,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 35 minutes, 50 seconds)
2025-05-09 04:36:36,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:36:37,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -16.70183 ± 14.429
2025-05-09 04:36:37,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [0.35073313, -17.706604, -28.993866, -13.602335, -1.6741855, -29.878084, -47.556778, -16.456419, -1.3855568, -10.11521]
2025-05-09 04:36:37,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 61.0, 46.0, 85.0, 27.0, 85.0, 96.0, 40.0, 33.0, 119.0]
2025-05-09 04:36:37,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 30 minutes, 45 seconds)
2025-05-09 04:39:11,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:39:17,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -95.05879 ± 110.572
2025-05-09 04:39:17,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-180.1235, -85.71101, -47.69, -289.91284, -27.971838, -290.29388, 13.596822, -7.7510366, 0.070795536, -34.80136]
2025-05-09 04:39:17,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 65.0, 40.0, 1000.0, 36.0, 1000.0, 38.0, 42.0, 40.0, 69.0]
2025-05-09 04:39:17,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 29 minutes, 25 seconds)
2025-05-09 04:42:08,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:42:08,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -12.00127 ± 25.537
2025-05-09 04:42:08,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-2.0228176, -56.983585, 38.34649, -4.318246, -51.736877, -10.382211, -0.89948404, -18.60334, -6.759318, -6.6532946]
2025-05-09 04:42:08,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 57.0, 68.0, 35.0, 51.0, 34.0, 49.0, 56.0, 30.0, 26.0]
2025-05-09 04:42:08,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 27 minutes, 6 seconds)
2025-05-09 04:44:36,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:44:37,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -16.32785 ± 18.591
2025-05-09 04:44:37,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [5.130628, -16.363012, -18.034746, -40.221664, -6.5412226, -17.984829, -11.079332, -56.98582, -9.201633, 8.00307]
2025-05-09 04:44:37,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [37.0, 25.0, 31.0, 127.0, 72.0, 54.0, 29.0, 57.0, 30.0, 38.0]
2025-05-09 04:44:37,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 21 minutes, 44 seconds)
2025-05-09 04:47:21,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:47:22,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -25.42684 ± 23.149
2025-05-09 04:47:22,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-3.904669, -8.459102, -19.81032, -89.020615, -33.038296, -20.039732, -8.022734, -27.721218, -14.542772, -29.70899]
2025-05-09 04:47:22,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 14.0, 47.0, 162.0, 39.0, 117.0, 27.0, 70.0, 58.0, 21.0]
2025-05-09 04:47:22,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 20 minutes, 49 seconds)
2025-05-09 04:49:52,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:49:53,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -5.98155 ± 26.337
2025-05-09 04:49:53,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [0.8316494, -6.5736094, -9.036425, -12.782797, -0.47587904, -17.47506, 59.534428, -33.629494, 4.5545144, -44.76279]
2025-05-09 04:49:53,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 26.0, 17.0, 33.0, 37.0, 39.0, 40.0, 58.0, 35.0, 56.0]
2025-05-09 04:49:53,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 17 minutes, 57 seconds)
2025-05-09 04:52:35,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:52:36,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -33.32484 ± 42.428
2025-05-09 04:52:36,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-8.795708, -81.205475, -2.8648012, -66.63997, -24.429523, 15.198943, -14.778525, -12.007403, -8.400123, -129.32576]
2025-05-09 04:52:36,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 84.0, 27.0, 49.0, 80.0, 93.0, 34.0, 38.0, 12.0, 80.0]
2025-05-09 04:52:36,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 15 minutes, 53 seconds)
2025-05-09 04:55:10,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:55:11,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -21.55858 ± 26.125
2025-05-09 04:55:11,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-7.35329, -11.180847, -18.214548, -24.099382, -27.98574, 17.83953, -18.149057, -13.5411005, -91.04426, -21.8571]
2025-05-09 04:55:11,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 20.0, 23.0, 73.0, 54.0, 103.0, 28.0, 81.0, 81.0, 35.0]
2025-05-09 04:55:11,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 10 minutes, 24 seconds)
2025-05-09 04:58:02,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 04:58:03,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -21.58212 ± 26.057
2025-05-09 04:58:03,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [7.6186104, 2.5773995, -49.327827, 0.25297663, -23.51739, -67.67751, -56.70821, -4.1589527, -0.7400513, -24.140253]
2025-05-09 04:58:03,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 73.0, 10.0, 99.0, 71.0, 94.0, 12.0, 32.0, 49.0]
2025-05-09 04:58:03,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 11 minutes, 37 seconds)
2025-05-09 05:00:41,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:00:41,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -20.72863 ± 28.626
2025-05-09 05:00:41,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-62.17847, -65.75994, 5.904039, -57.863552, 3.7392921, 0.90580153, -1.0417957, -20.190914, -20.1616, 9.360848]
2025-05-09 05:00:41,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 76.0, 14.0, 27.0, 32.0, 36.0, 32.0, 68.0, 36.0, 33.0]
2025-05-09 05:00:41,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 7 minutes, 55 seconds)
2025-05-09 05:03:18,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:03:19,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -19.81520 ± 14.644
2025-05-09 05:03:19,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-27.055964, -27.330343, 0.07810744, -25.971184, -19.265993, -5.8055863, -26.297009, 7.421748, -41.043003, -32.88273]
2025-05-09 05:03:19,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 96.0, 46.0, 20.0, 56.0, 31.0, 40.0, 19.0, 152.0, 94.0]
2025-05-09 05:03:19,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 6 minutes, 21 seconds)
2025-05-09 05:05:44,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:05:48,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -36.16267 ± 65.069
2025-05-09 05:05:48,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-32.78532, -17.734661, 2.135374, -148.8926, -178.09242, 2.4259553, 9.807239, -4.292796, 2.363796, 3.438762]
2025-05-09 05:05:48,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 112.0, 84.0, 1000.0, 1000.0, 23.0, 22.0, 25.0, 18.0, 21.0]
2025-05-09 05:05:48,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 1 minute, 25 seconds)
2025-05-09 05:08:27,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:08:28,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -14.69784 ± 19.987
2025-05-09 05:08:28,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [14.044711, -46.586964, 15.098677, -37.636276, -32.52295, -5.1375203, -4.792632, -8.065769, -28.267439, -13.112276]
2025-05-09 05:08:28,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [42.0, 58.0, 35.0, 70.0, 36.0, 16.0, 34.0, 73.0, 48.0, 33.0]
2025-05-09 05:08:28,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 59 minutes, 33 seconds)
2025-05-09 05:11:06,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:11:07,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -17.00346 ± 19.594
2025-05-09 05:11:07,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [6.5302253, -19.440111, -18.331936, -7.5903816, -61.455986, -30.208244, 12.76941, -23.63225, -21.200014, -7.47535]
2025-05-09 05:11:07,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 38.0, 21.0, 22.0, 64.0, 46.0, 21.0, 64.0, 67.0, 20.0]
2025-05-09 05:11:07,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 54 minutes, 59 seconds)
2025-05-09 05:13:44,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:13:48,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -17.85438 ± 37.289
2025-05-09 05:13:48,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-63.42323, 68.06785, -3.7160468, -19.303019, -7.032762, -5.1555157, -55.4707, -16.375484, -67.4855, -8.649404]
2025-05-09 05:13:48,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 1000.0, 13.0, 70.0, 60.0, 71.0, 1000.0, 24.0, 50.0, 44.0]
2025-05-09 05:13:48,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 52 minutes, 45 seconds)
2025-05-09 05:16:26,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:16:28,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -52.15922 ± 94.801
2025-05-09 05:16:28,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-40.426716, -27.221243, -27.4987, -53.871506, -5.3243446, -15.272303, 0.538365, -332.527, -11.9970455, -7.9916573]
2025-05-09 05:16:28,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [196.0, 30.0, 61.0, 62.0, 16.0, 56.0, 34.0, 1000.0, 48.0, 34.0]
2025-05-09 05:16:28,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 50 minutes, 29 seconds)
2025-05-09 05:19:09,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:19:10,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -24.34026 ± 22.958
2025-05-09 05:19:10,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-26.108107, -6.602458, -59.565742, -10.744781, -5.7548776, -15.15676, 3.3353057, -72.45514, -22.422628, -27.927462]
2025-05-09 05:19:10,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [38.0, 35.0, 67.0, 14.0, 27.0, 34.0, 28.0, 55.0, 47.0, 34.0]
2025-05-09 05:19:10,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 49 minutes, 32 seconds)
2025-05-09 05:21:45,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:21:45,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -10.50478 ± 20.993
2025-05-09 05:21:45,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [0.036325723, -68.08127, -0.52478385, -9.310857, 17.789865, -10.997765, -5.3776016, -14.267702, -4.328725, -9.985234]
2025-05-09 05:21:45,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 55.0, 26.0, 23.0, 18.0, 58.0, 41.0, 65.0, 23.0, 31.0]
2025-05-09 05:21:45,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 46 minutes, 18 seconds)
2025-05-09 05:24:22,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:24:23,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -27.30018 ± 41.425
2025-05-09 05:24:23,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-110.126884, -57.29071, -93.00914, 10.707271, -17.978695, -6.068951, 7.197583, -1.5194474, -4.889747, -0.02303719]
2025-05-09 05:24:23,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 75.0, 64.0, 41.0, 29.0, 11.0, 90.0, 15.0, 59.0, 20.0]
2025-05-09 05:24:23,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 43 minutes, 32 seconds)
2025-05-09 05:27:01,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:27:02,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -26.68575 ± 34.884
2025-05-09 05:27:02,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1.8019692, -36.84632, -43.620537, -11.194683, 1.9460182, -0.09197138, -107.74472, -60.651714, -21.894573, 11.439004]
2025-05-09 05:27:02,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [18.0, 47.0, 82.0, 22.0, 19.0, 22.0, 208.0, 182.0, 39.0, 24.0]
2025-05-09 05:27:02,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 40 minutes, 35 seconds)
2025-05-09 05:29:38,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:29:39,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -12.77680 ± 17.407
2025-05-09 05:29:39,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [2.9168236, -30.891153, -4.8613086, 5.336994, -7.682589, -3.8679848, -53.20461, -7.857271, -1.6098363, -26.047108]
2025-05-09 05:29:39,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [14.0, 45.0, 15.0, 19.0, 18.0, 39.0, 56.0, 16.0, 22.0, 25.0]
2025-05-09 05:29:39,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 37 minutes, 27 seconds)
2025-05-09 05:32:21,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:32:21,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -10.21498 ± 15.577
2025-05-09 05:32:21,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-4.4649153, -22.032274, 16.444857, -31.110302, -7.9756517, -3.6561575, -34.76925, -19.015371, 8.083802, -3.6544876]
2025-05-09 05:32:21,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 37.0, 40.0, 45.0, 46.0, 64.0, 75.0, 26.0, 39.0, 37.0]
2025-05-09 05:32:21,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 35 minutes, 1 second)
2025-05-09 05:34:55,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:34:55,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -24.17917 ± 45.148
2025-05-09 05:34:55,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [14.8058195, -11.014954, -40.770138, -122.0323, -78.3829, -1.0703791, 2.40555, -44.867565, 5.3309946, 33.804188]
2025-05-09 05:34:55,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [32.0, 36.0, 42.0, 90.0, 154.0, 14.0, 7.0, 40.0, 13.0, 65.0]
2025-05-09 05:34:55,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 32 minutes, 10 seconds)
2025-05-09 05:37:44,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:37:44,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -12.33523 ± 18.443
2025-05-09 05:37:44,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-13.327521, -0.24514255, -13.072116, -39.25529, -15.92997, 27.13866, -3.1026595, -34.15817, -29.371588, -2.0284855]
2025-05-09 05:37:44,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 17.0, 14.0, 72.0, 39.0, 52.0, 41.0, 25.0, 24.0, 60.0]
2025-05-09 05:37:44,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 30 minutes, 48 seconds)
2025-05-09 05:40:24,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:40:24,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -12.33906 ± 20.409
2025-05-09 05:40:24,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-15.695805, 6.415208, -1.7243865, 8.308803, 0.12316959, -3.6046643, -40.147728, -12.978072, -4.826879, -59.260204]
2025-05-09 05:40:24,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 13.0, 36.0, 39.0, 70.0, 115.0, 39.0, 17.0, 72.0]
2025-05-09 05:40:24,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 28 minutes, 14 seconds)
2025-05-09 05:42:48,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:42:48,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -5.83305 ± 13.386
2025-05-09 05:42:48,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-4.440588, -18.463045, 8.031278, -6.0506773, -6.083536, 15.014812, -2.7322326, -35.79282, -10.575386, 2.7616668]
2025-05-09 05:42:48,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [12.0, 16.0, 70.0, 39.0, 35.0, 22.0, 34.0, 66.0, 18.0, 25.0]
2025-05-09 05:42:48,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 24 minutes, 14 seconds)
2025-05-09 05:45:28,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:45:28,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -17.69012 ± 25.977
2025-05-09 05:45:28,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-8.598755, -46.863445, -11.648469, 0.74608254, 22.477098, -11.330122, -77.803474, -12.925208, -23.77891, -7.176007]
2025-05-09 05:45:28,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [30.0, 65.0, 31.0, 23.0, 46.0, 20.0, 55.0, 18.0, 27.0, 33.0]
2025-05-09 05:45:28,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 21 minutes, 17 seconds)
2025-05-09 05:48:03,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:48:04,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -18.66697 ± 36.314
2025-05-09 05:48:04,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-46.804848, -6.9521103, -11.899056, -21.185238, -112.744545, -15.537686, -14.378847, 7.7551727, 19.436668, 15.640801]
2025-05-09 05:48:04,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 20.0, 29.0, 21.0, 85.0, 15.0, 71.0, 29.0, 61.0, 48.0]
2025-05-09 05:48:04,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 18 minutes, 50 seconds)
2025-05-09 05:50:41,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:50:44,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -39.55500 ± 52.421
2025-05-09 05:50:44,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-34.95484, 7.002811, 3.1478965, -85.07757, -96.08793, 4.9545875, 10.01011, -154.6618, -18.457989, -31.425333]
2025-05-09 05:50:44,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 35.0, 16.0, 93.0, 90.0, 30.0, 26.0, 1000.0, 13.0, 29.0]
2025-05-09 05:50:44,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 15 minutes, 20 seconds)
2025-05-09 05:53:21,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:53:22,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -17.51772 ± 26.696
2025-05-09 05:53:22,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [8.030427, 9.374428, -13.204124, -74.54002, 2.4176319, -5.508896, -33.108162, -55.636093, -6.345554, -6.6568227]
2025-05-09 05:53:22,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 14.0, 61.0, 82.0, 14.0, 33.0, 49.0, 30.0, 15.0, 16.0]
2025-05-09 05:53:22,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 12 minutes, 32 seconds)
2025-05-09 05:55:57,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:55:57,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -7.82632 ± 11.141
2025-05-09 05:55:57,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-24.171183, -15.662188, 11.033299, 5.9133124, -5.3310914, 0.2442298, -6.8632493, -7.2865295, -24.619564, -11.520235]
2025-05-09 05:55:57,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 21.0, 56.0, 25.0, 36.0, 52.0, 31.0, 20.0, 54.0, 63.0]
2025-05-09 05:55:57,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 11 minutes)
2025-05-09 05:58:35,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 05:58:35,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.81573 ± 16.938
2025-05-09 05:58:35,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [9.376356, -9.393652, -51.037735, -21.154232, 0.60084575, -0.47746086, 12.008556, -9.052373, -7.284996, -11.742617]
2025-05-09 05:58:35,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 28.0, 42.0, 53.0, 11.0, 17.0, 20.0, 33.0, 28.0, 18.0]
2025-05-09 05:58:35,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 8 minutes, 11 seconds)
2025-05-09 06:01:11,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:01:11,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -14.44688 ± 30.509
2025-05-09 06:01:11,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-1.4725325, -101.35774, 0.0078412015, 9.83653, -3.2433107, -23.516779, 4.980627, -16.300037, -12.993269, -0.4101297]
2025-05-09 06:01:11,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 66.0, 11.0, 16.0, 15.0, 40.0, 24.0, 22.0, 19.0, 20.0]
2025-05-09 06:01:11,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 5 minutes, 35 seconds)
2025-05-09 06:03:48,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:03:48,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -20.59392 ± 15.593
2025-05-09 06:03:48,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1.9309094, -36.081924, -3.0974486, -24.714296, -38.58154, -11.370586, -11.466891, -21.94574, -48.727867, -11.8838215]
2025-05-09 06:03:48,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 46.0, 12.0, 26.0, 55.0, 30.0, 15.0, 46.0, 48.0, 20.0]
2025-05-09 06:03:48,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 2 minutes, 47 seconds)
2025-05-09 06:06:24,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:06:26,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -16.26628 ± 23.090
2025-05-09 06:06:26,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-7.242471, -16.044998, -73.19233, -40.397446, -19.138247, 7.116162, -6.087059, -7.6971107, -9.077629, 9.098301]
2025-05-09 06:06:26,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 14.0, 1000.0, 49.0, 22.0, 23.0, 28.0, 26.0, 17.0, 25.0]
2025-05-09 06:06:26,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 8 seconds)
2025-05-09 06:09:01,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:09:01,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -5.99166 ± 15.272
2025-05-09 06:09:01,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-30.754686, -23.532299, -11.177926, 4.790404, -13.70929, 16.091774, 12.526687, -20.2643, 0.73451924, 5.378558]
2025-05-09 06:09:01,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 41.0, 32.0, 21.0, 85.0, 24.0, 27.0, 36.0, 15.0, 67.0]
2025-05-09 06:09:01,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 57 minutes, 29 seconds)
2025-05-09 06:11:35,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:11:36,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -9.37163 ± 10.813
2025-05-09 06:11:36,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [9.675036, -0.8136291, -3.706384, -18.99819, -7.333503, -18.72343, -8.172555, -9.366228, -31.584282, -4.6930976]
2025-05-09 06:11:36,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 16.0, 14.0, 28.0, 29.0, 19.0, 26.0, 19.0, 32.0, 21.0]
2025-05-09 06:11:36,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 54 minutes, 38 seconds)
2025-05-09 06:14:16,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:14:16,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -23.16762 ± 25.743
2025-05-09 06:14:16,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-26.378538, -19.937918, -16.767094, -1.7543811, -9.49909, -8.361991, -16.359276, -2.4012625, -94.30559, -35.911034]
2025-05-09 06:14:16,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [66.0, 28.0, 32.0, 16.0, 36.0, 11.0, 39.0, 41.0, 96.0, 45.0]
2025-05-09 06:14:16,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 52 minutes, 22 seconds)
2025-05-09 06:16:47,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:16:47,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -9.85448 ± 12.730
2025-05-09 06:16:47,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-13.09694, -0.1291114, -27.731882, -21.623352, -16.972603, -6.4706244, -4.806434, 8.57875, -25.812012, 9.51943]
2025-05-09 06:16:47,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 15.0, 46.0, 33.0, 25.0, 15.0, 20.0, 27.0, 62.0, 36.0]
2025-05-09 06:16:47,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 49 minutes, 19 seconds)
2025-05-09 06:19:20,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:19:21,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -40.59886 ± 41.420
2025-05-09 06:19:21,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-23.44235, -110.43437, -23.647032, -120.483246, -36.129795, -8.107284, -25.378342, 7.1370106, -61.094944, -4.408267]
2025-05-09 06:19:21,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 67.0, 41.0, 73.0, 36.0, 30.0, 53.0, 20.0, 67.0, 68.0]
2025-05-09 06:19:21,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 46 minutes, 29 seconds)
2025-05-09 06:21:54,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:21:55,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -7.44941 ± 13.595
2025-05-09 06:21:55,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-14.968172, -32.16484, 18.578516, 0.6696, -21.367403, 2.7206192, -9.039337, -5.829346, 1.6252428, -14.719031]
2025-05-09 06:21:55,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 70.0, 39.0, 50.0, 69.0, 46.0, 46.0, 30.0, 14.0, 26.0]
2025-05-09 06:21:55,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 43 minutes, 49 seconds)
2025-05-09 06:24:30,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:24:31,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -21.87637 ± 22.519
2025-05-09 06:24:31,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-7.178547, -71.166626, -7.2188234, -58.684795, -8.134457, -20.155556, -23.919388, -9.432166, -2.6858728, -10.187488]
2025-05-09 06:24:31,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 102.0, 13.0, 48.0, 16.0, 29.0, 58.0, 34.0, 12.0, 16.0]
2025-05-09 06:24:31,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 41 minutes, 21 seconds)
2025-05-09 06:27:05,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:27:06,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -17.66385 ± 20.621
2025-05-09 06:27:06,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-2.5127442, -13.778934, -46.691883, -19.250336, 5.761706, -8.470455, -64.39063, -13.119332, -14.227426, 0.04151325]
2025-05-09 06:27:06,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 26.0, 68.0, 31.0, 18.0, 21.0, 93.0, 22.0, 86.0, 13.0]
2025-05-09 06:27:06,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 38 minutes, 27 seconds)
2025-05-09 06:29:41,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:29:42,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -33.66566 ± 32.426
2025-05-09 06:29:42,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-18.525753, -33.255547, -20.758785, -0.47386748, -62.52591, -20.268835, -101.976395, -71.64739, 3.0691903, -10.293359]
2025-05-09 06:29:42,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 22.0, 34.0, 112.0, 24.0, 103.0, 50.0, 19.0, 21.0]
2025-05-09 06:29:42,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 36 minutes, 8 seconds)
2025-05-09 06:32:17,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:32:17,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -11.45442 ± 13.252
2025-05-09 06:32:17,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-33.228268, -18.59061, -19.267784, 4.886005, 1.3698498, 7.2458177, -27.300617, -19.149025, -4.321734, -6.187837]
2025-05-09 06:32:17,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [48.0, 21.0, 34.0, 58.0, 21.0, 17.0, 57.0, 18.0, 18.0, 17.0]
2025-05-09 06:32:17,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 33 minutes, 37 seconds)
2025-05-09 06:35:02,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:35:03,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -11.54464 ± 25.670
2025-05-09 06:35:03,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-74.02688, -17.658638, -18.567572, 21.03143, 2.0153968, -14.115561, 18.796951, 5.5355496, -16.345507, -22.111607]
2025-05-09 06:35:03,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 96.0, 24.0, 14.0, 27.0, 48.0, 33.0, 36.0, 21.0, 23.0]
2025-05-09 06:35:03,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 30 seconds)
2025-05-09 06:37:36,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:37:36,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -17.25161 ± 17.534
2025-05-09 06:37:36,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-9.822318, -35.530422, -3.7020628, -22.710817, -0.004862209, -18.40914, 1.9770931, -53.84756, -0.13667534, -30.32938]
2025-05-09 06:37:36,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 35.0, 66.0, 50.0, 16.0, 23.0, 8.0, 51.0, 13.0, 38.0]
2025-05-09 06:37:36,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 28 minutes, 48 seconds)
2025-05-09 06:40:03,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:40:04,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -38.66648 ± 32.882
2025-05-09 06:40:04,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-26.731384, 11.142595, -36.984074, -60.58904, -95.728584, -36.89373, -14.344498, -92.17999, -11.364493, -22.991613]
2025-05-09 06:40:04,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [78.0, 52.0, 26.0, 49.0, 99.0, 28.0, 19.0, 76.0, 24.0, 33.0]
2025-05-09 06:40:04,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 25 minutes, 56 seconds)
2025-05-09 06:42:37,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:42:37,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -3.88055 ± 16.963
2025-05-09 06:42:37,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [0.025109505, -8.783728, -33.82666, -9.634011, 31.246588, -24.04712, -3.7757342, 0.6679322, 11.769764, -2.4476457]
2025-05-09 06:42:37,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 17.0, 36.0, 55.0, 42.0, 36.0, 29.0, 44.0, 18.0, 17.0]
2025-05-09 06:42:37,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 15 seconds)
2025-05-09 06:45:10,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:45:11,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -18.64506 ± 30.781
2025-05-09 06:45:11,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-4.8253994, -0.33544523, -13.822687, -25.959637, -5.4944115, -17.087479, 9.118654, -106.79873, -7.0346146, -14.210813]
2025-05-09 06:45:11,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 13.0, 24.0, 88.0, 27.0, 25.0, 12.0, 59.0, 18.0, 32.0]
2025-05-09 06:45:11,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 37 seconds)
2025-05-09 06:47:46,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:47:46,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.99980 ± 17.584
2025-05-09 06:47:46,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-12.741251, 37.253628, -8.341988, -12.114462, -5.706507, -3.2993073, -7.595743, -30.239265, -21.960138, -25.253017]
2025-05-09 06:47:46,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [34.0, 54.0, 26.0, 24.0, 30.0, 25.0, 52.0, 37.0, 21.0, 30.0]
2025-05-09 06:47:46,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 17 minutes, 49 seconds)
2025-05-09 06:50:23,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:50:24,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -21.78695 ± 12.999
2025-05-09 06:50:24,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-21.501251, -25.224586, -16.057024, -25.335533, -19.4526, -11.44958, 0.4715357, -49.99675, -14.359807, -34.96388]
2025-05-09 06:50:24,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [28.0, 30.0, 23.0, 72.0, 33.0, 24.0, 11.0, 60.0, 45.0, 53.0]
2025-05-09 06:50:24,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 20 seconds)
2025-05-09 06:53:00,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:53:00,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -18.18968 ± 14.842
2025-05-09 06:53:00,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-9.095066, -28.468906, -23.393002, -7.775822, -9.125031, -8.848277, -7.1749125, -16.002, -14.228795, -57.785]
2025-05-09 06:53:00,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [25.0, 18.0, 41.0, 32.0, 26.0, 28.0, 19.0, 52.0, 11.0, 42.0]
2025-05-09 06:53:00,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 56 seconds)
2025-05-09 06:55:43,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:55:43,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -13.10707 ± 12.441
2025-05-09 06:55:43,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1.6903154, 1.9503057, -27.256056, 2.818509, -14.381105, -15.90435, -14.606841, -7.8217607, -36.440292, -21.119402]
2025-05-09 06:55:43,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 15.0, 68.0, 42.0, 24.0, 20.0, 33.0, 17.0, 29.0, 21.0]
2025-05-09 06:55:43,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 28 seconds)
2025-05-09 06:58:11,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 06:58:12,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -14.15716 ± 31.950
2025-05-09 06:58:12,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-0.085267186, 1.0615531, -8.080073, -79.61942, -47.922848, 42.958687, 5.500871, -3.1421826, -14.718059, -37.52484]
2025-05-09 06:58:12,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 17.0, 72.0, 44.0, 37.0, 15.0, 36.0, 11.0, 30.0]
2025-05-09 06:58:12,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 48 seconds)
2025-05-09 07:00:49,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:00:50,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.44647 ± 19.288
2025-05-09 07:00:50,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [14.361071, 17.555555, -12.467049, -7.77854, -4.720612, -32.43898, -4.1779833, -3.0566876, -0.735696, -51.005802]
2025-05-09 07:00:50,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [41.0, 28.0, 42.0, 57.0, 19.0, 27.0, 18.0, 23.0, 11.0, 45.0]
2025-05-09 07:00:50,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 13 seconds)
2025-05-09 07:03:25,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:03:29,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -62.91086 ± 141.144
2025-05-09 07:03:29,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [11.42897, -93.076546, -478.60333, -2.6461654, -17.231623, -14.661177, 0.29474166, -15.038083, -9.03318, -10.542148]
2025-05-09 07:03:29,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [13.0, 1000.0, 1000.0, 13.0, 93.0, 44.0, 7.0, 17.0, 26.0, 15.0]
2025-05-09 07:03:29,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 37 seconds)
2025-05-09 07:06:03,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:06:05,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -52.79372 ± 128.258
2025-05-09 07:06:05,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-19.780676, -437.03473, -10.933467, -1.198378, -2.7892497, -18.674463, -0.08051648, -11.184513, -8.9382715, -17.322943]
2025-05-09 07:06:05,482 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 13.0, 15.0, 13.0, 23.0, 7.0, 14.0, 22.0, 15.0]
2025-05-09 07:06:05,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1251 [DEBUG]: Training session finished
