2025-05-07 11:19:31,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac
2025-05-07 11:19:31,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac
2025-05-07 11:19:31,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7b7cecbc6ca0>}
2025-05-07 11:19:31,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1009 [DEBUG]: using device: cpu
2025-05-07 11:19:31,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1031 [INFO]: Creating new trainer
2025-05-07 11:19:31,234 baseline-sac-noisy-ant:105 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=27, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-07 11:19:31,234 baseline-sac-noisy-ant:106 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 11:19:31,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1092 [DEBUG]: Starting training session...
2025-05-07 11:19:31,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 1/100
2025-05-07 11:21:32,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:21:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -116.06621 ± 200.125
2025-05-07 11:21:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-9.415707, -9.612203, -126.408, -71.51276, -13.950409, -67.77873, -81.512054, -39.042576, -706.96045, -34.4693]
2025-05-07 11:21:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [17.0, 14.0, 94.0, 48.0, 13.0, 70.0, 72.0, 39.0, 483.0, 25.0]
2025-05-07 11:21:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-116.07) for latency ExtremeSparseL4U32
2025-05-07 11:21:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:21:33,789 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:21:33,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 21 minutes, 58 seconds)
2025-05-07 11:23:51,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:23:57,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -149.91611 ± 226.419
2025-05-07 11:23:57,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [7.6150613, -478.67258, -12.277828, 3.6728926, -498.77484, -3.322758, -509.04428, -6.6256075, 5.9642, -7.6953487]
2025-05-07 11:23:57,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [79.0, 1000.0, 27.0, 24.0, 1000.0, 15.0, 1000.0, 31.0, 17.0, 56.0]
2025-05-07 11:23:57,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 36 minutes, 57 seconds)
2025-05-07 11:26:34,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:26:41,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -192.15034 ± 198.238
2025-05-07 11:26:41,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-277.38556, -526.69403, -94.30104, -453.47397, -13.045096, -66.67376, -432.08405, -37.103085, -24.79865, 4.0558496]
2025-05-07 11:26:41,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [680.0, 1000.0, 206.0, 1000.0, 102.0, 182.0, 1000.0, 89.0, 38.0, 29.0]
2025-05-07 11:26:41,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 51 minutes, 54 seconds)
2025-05-07 11:29:18,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:29:21,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -64.86174 ± 135.322
2025-05-07 11:29:21,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-31.326685, 10.184163, -463.6295, -13.851644, 3.4002116, -50.231464, -72.07322, -27.867998, 13.419586, -16.64078]
2025-05-07 11:29:21,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [182.0, 67.0, 1000.0, 37.0, 16.0, 167.0, 175.0, 34.0, 27.0, 38.0]
2025-05-07 11:29:21,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-64.86) for latency ExtremeSparseL4U32
2025-05-07 11:29:21,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:29:21,544 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:29:21,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 56 minutes, 3 seconds)
2025-05-07 11:32:11,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:32:12,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -47.22706 ± 39.682
2025-05-07 11:32:12,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [9.574227, -68.17179, -68.27811, -64.56977, -12.94717, -7.143186, -7.8529115, -92.533066, -117.46746, -42.881397]
2025-05-07 11:32:12,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [30.0, 117.0, 64.0, 115.0, 21.0, 21.0, 15.0, 64.0, 139.0, 152.0]
2025-05-07 11:32:12,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-47.23) for latency ExtremeSparseL4U32
2025-05-07 11:32:12,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:32:12,058 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:32:12,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 52 seconds)
2025-05-07 11:34:46,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:34:46,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -5.72093 ± 19.615
2025-05-07 11:34:46,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [0.0401358, 13.102479, -1.7512875, 5.257009, -4.929731, -30.011234, -53.215244, 10.336325, 9.086859, -5.12464]
2025-05-07 11:34:46,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [21.0, 52.0, 20.0, 19.0, 95.0, 142.0, 71.0, 34.0, 71.0, 31.0]
2025-05-07 11:34:46,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-5.72) for latency ExtremeSparseL4U32
2025-05-07 11:34:46,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:34:46,886 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:34:46,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 8 minutes, 30 seconds)
2025-05-07 11:37:32,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:37:34,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -24.84846 ± 32.503
2025-05-07 11:37:34,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-5.66234, -60.99113, -19.040995, -24.32123, -22.378622, -25.60938, 3.0226185, -97.33078, 28.241861, -24.414581]
2025-05-07 11:37:34,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [63.0, 101.0, 54.0, 121.0, 83.0, 88.0, 71.0, 114.0, 27.0, 324.0]
2025-05-07 11:37:34,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 13 minutes, 23 seconds)
2025-05-07 11:40:01,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:40:05,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -163.38075 ± 214.782
2025-05-07 11:40:05,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-81.60287, -14.456475, -75.015594, -615.3741, -180.09196, -545.8871, 0.2404233, -13.56159, -36.4438, -71.61442]
2025-05-07 11:40:05,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [96.0, 158.0, 65.0, 1000.0, 202.0, 1000.0, 52.0, 17.0, 58.0, 118.0]
2025-05-07 11:40:05,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 6 minutes, 32 seconds)
2025-05-07 11:42:55,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:42:58,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -77.23302 ± 155.472
2025-05-07 11:42:58,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [0.5099384, -10.3760605, -1.0616955, -18.464422, -20.107723, -61.667713, -535.6549, -96.200356, -21.651907, -7.6553187]
2025-05-07 11:42:58,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [41.0, 29.0, 29.0, 84.0, 127.0, 97.0, 1000.0, 92.0, 96.0, 18.0]
2025-05-07 11:42:58,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 7 minutes, 44 seconds)
2025-05-07 11:45:28,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:45:32,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -164.97716 ± 261.666
2025-05-07 11:45:32,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-27.574902, -744.975, -70.80099, -102.85394, -28.067244, -15.0015335, -12.129338, -619.1659, -31.509542, 2.3067136]
2025-05-07 11:45:32,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [83.0, 1000.0, 125.0, 143.0, 29.0, 79.0, 22.0, 1000.0, 62.0, 44.0]
2025-05-07 11:45:32,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 9 seconds)
2025-05-07 11:48:20,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:48:22,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -70.11394 ± 165.555
2025-05-07 11:48:22,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-15.232448, -111.03783, -42.344433, -555.5991, 7.4123044, 7.2111387, -0.47077778, -1.4299835, 9.406137, 0.94565153]
2025-05-07 11:48:22,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [25.0, 80.0, 78.0, 1000.0, 66.0, 24.0, 116.0, 20.0, 16.0, 20.0]
2025-05-07 11:48:22,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 1 minute, 55 seconds)
2025-05-07 11:50:58,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:51:02,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -166.56815 ± 243.395
2025-05-07 11:51:02,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-642.8555, -16.148462, -63.67564, -18.187395, -5.89263, -147.45078, -650.1449, -14.283523, -25.65795, -81.38465]
2025-05-07 11:51:02,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 28.0, 46.0, 78.0, 127.0, 205.0, 1000.0, 18.0, 45.0, 92.0]
2025-05-07 11:51:02,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 57 minutes, 5 seconds)
2025-05-07 11:53:39,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:53:42,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -95.15063 ± 175.157
2025-05-07 11:53:42,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-25.882828, -9.572841, -611.005, -36.26235, -10.347821, -86.09225, -37.261673, 4.6691117, -110.12394, -29.626762]
2025-05-07 11:53:42,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [48.0, 100.0, 1000.0, 59.0, 22.0, 80.0, 38.0, 22.0, 119.0, 40.0]
2025-05-07 11:53:42,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 56 minutes, 46 seconds)
2025-05-07 11:56:34,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:56:37,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -103.80654 ± 160.043
2025-05-07 11:56:37,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-31.463865, -88.373314, -74.713646, -41.171406, -24.81399, -14.147023, -140.87708, -570.59247, -36.777485, -15.135159]
2025-05-07 11:56:37,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [93.0, 92.0, 95.0, 72.0, 26.0, 25.0, 151.0, 1000.0, 156.0, 37.0]
2025-05-07 11:56:37,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 54 minutes, 47 seconds)
2025-05-07 11:59:05,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:59:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -50.37457 ± 41.440
2025-05-07 11:59:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-58.522297, -14.474718, -7.025702, -17.335737, -93.35765, -52.01832, -146.66393, -60.649845, -42.85653, -10.840879]
2025-05-07 11:59:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [170.0, 116.0, 28.0, 54.0, 194.0, 44.0, 96.0, 69.0, 83.0, 54.0]
2025-05-07 11:59:06,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 50 minutes, 44 seconds)
2025-05-07 12:01:55,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:01:58,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -68.65602 ± 147.310
2025-05-07 12:01:58,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-12.726379, -12.981636, -97.9145, -29.889547, 15.895538, -24.676353, -501.83395, -24.30997, -3.2329004, 5.109442]
2025-05-07 12:01:58,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [179.0, 37.0, 155.0, 45.0, 19.0, 48.0, 1000.0, 43.0, 23.0, 115.0]
2025-05-07 12:01:58,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 48 minutes, 31 seconds)
2025-05-07 12:04:35,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:04:36,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -49.40827 ± 41.332
2025-05-07 12:04:36,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-36.964157, -31.585047, -110.75492, -21.30116, -32.243484, -33.617615, -42.130447, -35.77341, -145.49377, -4.2187014]
2025-05-07 12:04:36,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [45.0, 114.0, 206.0, 78.0, 86.0, 94.0, 181.0, 44.0, 136.0, 29.0]
2025-05-07 12:04:36,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 45 minutes, 11 seconds)
2025-05-07 12:07:10,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:07:12,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -84.98584 ± 182.409
2025-05-07 12:07:12,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-622.0139, 10.780673, -0.1933329, -49.910435, -94.8887, -14.733596, 5.545788, -23.044928, -74.171165, 12.771156]
2025-05-07 12:07:12,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 59.0, 14.0, 78.0, 115.0, 23.0, 23.0, 154.0, 72.0, 26.0]
2025-05-07 12:07:12,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 41 minutes, 37 seconds)
2025-05-07 12:10:00,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:10:04,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -171.82504 ± 242.172
2025-05-07 12:10:04,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [10.131559, 13.480896, -63.52611, -33.686855, -19.685043, -168.7228, -701.4806, -567.1663, -183.39105, -4.2040844]
2025-05-07 12:10:04,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [28.0, 25.0, 77.0, 97.0, 70.0, 114.0, 1000.0, 1000.0, 159.0, 14.0]
2025-05-07 12:10:04,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 37 minutes, 57 seconds)
2025-05-07 12:12:36,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:12:40,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -168.08943 ± 295.808
2025-05-07 12:12:40,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-733.7997, 3.9314315, -32.70068, -781.66113, -45.22557, -4.942314, -15.285138, -7.017688, 4.025504, -68.21915]
2025-05-07 12:12:40,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 27.0, 24.0, 1000.0, 48.0, 24.0, 72.0, 23.0, 64.0, 113.0]
2025-05-07 12:12:40,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 36 minutes, 54 seconds)
2025-05-07 12:15:20,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:15:21,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -8.42747 ± 12.280
2025-05-07 12:15:21,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-14.922405, -2.977639, 0.057373475, 2.4180758, -32.030476, -9.372474, 6.895454, -19.48647, 5.40706, -20.263208]
2025-05-07 12:15:21,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [27.0, 35.0, 24.0, 18.0, 64.0, 21.0, 64.0, 43.0, 27.0, 42.0]
2025-05-07 12:15:21,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 31 minutes, 23 seconds)
2025-05-07 12:18:00,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:18:02,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -77.41508 ± 204.258
2025-05-07 12:18:02,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-2.5508199, -10.254447, -23.035503, 6.8052025, -12.985661, -7.151264, -689.7359, -5.9493375, -10.403811, -18.889288]
2025-05-07 12:18:02,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [55.0, 18.0, 43.0, 22.0, 52.0, 42.0, 1000.0, 44.0, 23.0, 61.0]
2025-05-07 12:18:02,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 29 minutes, 35 seconds)
2025-05-07 12:20:40,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:20:42,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -99.65556 ± 227.500
2025-05-07 12:20:42,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [4.975687, -12.052012, -3.336891, 0.48395187, -31.253132, -774.4705, -45.677113, -25.446249, -114.68235, 4.9030876]
2025-05-07 12:20:42,436 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [13.0, 35.0, 23.0, 22.0, 48.0, 1000.0, 46.0, 26.0, 123.0, 45.0]
2025-05-07 12:20:42,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 27 minutes, 46 seconds)
2025-05-07 12:23:27,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:23:28,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -27.41782 ± 48.362
2025-05-07 12:23:28,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [6.789853, -30.853014, -15.777881, -9.174841, -45.157204, 24.355944, -160.42833, -27.561169, -20.668148, 4.2965665]
2025-05-07 12:23:28,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [29.0, 92.0, 107.0, 36.0, 62.0, 90.0, 268.0, 41.0, 26.0, 19.0]
2025-05-07 12:23:28,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 23 minutes, 34 seconds)
2025-05-07 12:26:03,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:26:04,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -28.63307 ± 32.742
2025-05-07 12:26:04,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [4.461656, -17.60891, -58.18133, -40.933846, -44.0804, -95.864105, 11.307223, -40.55417, -19.91233, 15.035542]
2025-05-07 12:26:04,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [20.0, 31.0, 92.0, 132.0, 72.0, 135.0, 46.0, 55.0, 25.0, 26.0]
2025-05-07 12:26:04,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 21 minutes, 6 seconds)
2025-05-07 12:28:47,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:28:50,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -104.30288 ± 189.964
2025-05-07 12:28:50,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-48.782932, -8.248003, -11.8169, -43.664806, -58.881783, -42.114044, -668.9156, -102.681496, -17.784447, -40.138786]
2025-05-07 12:28:50,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [64.0, 38.0, 25.0, 47.0, 99.0, 65.0, 1000.0, 135.0, 25.0, 45.0]
2025-05-07 12:28:50,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 19 minutes, 35 seconds)
2025-05-07 12:31:28,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:31:34,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -206.77995 ± 238.166
2025-05-07 12:31:34,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-36.503902, -107.290794, -318.6395, -72.61351, -171.64624, -623.5384, -41.14832, -674.2404, 2.3553112, -24.533812]
2025-05-07 12:31:34,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [77.0, 162.0, 549.0, 75.0, 153.0, 1000.0, 79.0, 1000.0, 21.0, 56.0]
2025-05-07 12:31:34,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 17 minutes, 24 seconds)
2025-05-07 12:34:10,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:34:11,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -39.63900 ± 41.468
2025-05-07 12:34:11,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-22.992136, 10.748233, -111.907616, -4.301176, -117.18701, -18.163076, -15.250864, -25.134325, -29.597708, -62.604324]
2025-05-07 12:34:11,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [31.0, 21.0, 98.0, 42.0, 176.0, 53.0, 43.0, 46.0, 54.0, 144.0]
2025-05-07 12:34:11,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 14 minutes, 13 seconds)
2025-05-07 12:37:06,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:37:08,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -66.33501 ± 166.023
2025-05-07 12:37:08,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [3.4605975, -5.9065776, -563.42224, -3.050622, -1.8277787, -21.68894, -23.613022, -5.535195, -30.982775, -10.783537]
2025-05-07 12:37:08,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [19.0, 14.0, 1000.0, 42.0, 17.0, 30.0, 59.0, 40.0, 33.0, 55.0]
2025-05-07 12:37:08,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 14 minutes, 11 seconds)
2025-05-07 12:39:46,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:39:49,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -103.46883 ± 200.786
2025-05-07 12:39:49,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-29.547161, -3.7329879, -10.557109, -699.4308, -56.97783, -15.626634, -87.58791, -49.225613, -1.8883066, -80.11397]
2025-05-07 12:39:49,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [112.0, 19.0, 24.0, 1000.0, 86.0, 27.0, 193.0, 81.0, 22.0, 145.0]
2025-05-07 12:39:49,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 12 minutes, 20 seconds)
2025-05-07 12:42:27,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:42:28,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -35.88811 ± 20.112
2025-05-07 12:42:28,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-22.806278, -63.287693, -63.354813, -26.583118, -47.660023, -5.8820896, -48.15184, -50.88397, -16.942385, -13.328884]
2025-05-07 12:42:28,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [55.0, 77.0, 70.0, 83.0, 130.0, 29.0, 48.0, 69.0, 52.0, 49.0]
2025-05-07 12:42:28,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 8 minutes, 9 seconds)
2025-05-07 12:44:58,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:44:59,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -56.81158 ± 49.726
2025-05-07 12:44:59,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-20.989222, -83.936935, -78.80982, -21.996437, -1.9969115, -52.038013, -99.87339, -58.58753, -162.89041, 13.002784]
2025-05-07 12:44:59,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [86.0, 108.0, 65.0, 33.0, 21.0, 47.0, 85.0, 49.0, 131.0, 33.0]
2025-05-07 12:44:59,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 2 minutes, 38 seconds)
2025-05-07 12:47:36,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:47:37,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -50.34845 ± 40.156
2025-05-07 12:47:37,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-53.31475, -9.954795, -31.22879, -97.733444, -14.453089, -134.03047, -80.18817, -52.73406, -24.309454, -5.537453]
2025-05-07 12:47:37,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [128.0, 25.0, 47.0, 127.0, 35.0, 207.0, 123.0, 122.0, 100.0, 31.0]
2025-05-07 12:47:37,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 4 seconds)
2025-05-07 12:50:17,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:50:18,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -29.38371 ± 43.982
2025-05-07 12:50:18,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-7.779583, -2.6454751, -81.74617, -21.965622, -139.98558, -12.979257, 0.35451704, -24.660812, -14.824476, 12.395358]
2025-05-07 12:50:18,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [27.0, 37.0, 77.0, 32.0, 158.0, 35.0, 16.0, 22.0, 51.0, 52.0]
2025-05-07 12:50:18,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 53 minutes, 40 seconds)
2025-05-07 12:52:58,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:52:58,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -21.36164 ± 23.385
2025-05-07 12:52:58,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-31.710312, -79.69043, -38.69196, -11.007611, 0.3960415, -2.9405506, -3.8977528, -5.9775066, -10.207053, -29.889227]
2025-05-07 12:52:58,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [76.0, 212.0, 50.0, 19.0, 21.0, 32.0, 24.0, 26.0, 22.0, 41.0]
2025-05-07 12:52:58,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 51 minutes, 6 seconds)
2025-05-07 12:55:51,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:55:52,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -19.65944 ± 28.818
2025-05-07 12:55:52,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-3.0254898, -62.607265, -64.75036, -27.809383, 0.095698595, 36.83611, -10.174238, -32.900684, -26.415308, -5.8435297]
2025-05-07 12:55:52,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [61.0, 68.0, 102.0, 57.0, 23.0, 103.0, 17.0, 31.0, 36.0, 21.0]
2025-05-07 12:55:52,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-05-07 12:58:22,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:58:23,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -60.34911 ± 28.738
2025-05-07 12:58:23,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-78.43146, -8.191686, -97.52028, -73.45725, -93.63699, -83.98505, -36.655533, -57.396755, -24.059965, -50.156162]
2025-05-07 12:58:23,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [80.0, 36.0, 63.0, 64.0, 124.0, 104.0, 55.0, 45.0, 34.0, 122.0]
2025-05-07 12:58:23,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 48 minutes, 42 seconds)
2025-05-07 13:01:09,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:01:11,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -80.15201 ± 182.412
2025-05-07 13:01:11,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-1.586675, -8.54252, -620.90155, -49.086906, -3.1586964, -15.355609, -1.7584994, -85.07043, 18.09811, -34.15724]
2025-05-07 13:01:11,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [19.0, 31.0, 1000.0, 112.0, 32.0, 43.0, 18.0, 156.0, 62.0, 76.0]
2025-05-07 13:01:11,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 48 minutes, 8 seconds)
2025-05-07 13:03:39,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:03:45,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -226.44174 ± 282.173
2025-05-07 13:03:45,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-39.71442, -90.32709, -15.287455, -4.365233, -691.9824, -64.64044, -616.7884, -71.96218, -13.46931, -655.8806]
2025-05-07 13:03:45,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [71.0, 124.0, 54.0, 53.0, 1000.0, 52.0, 1000.0, 64.0, 81.0, 1000.0]
2025-05-07 13:03:45,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 44 minutes, 4 seconds)
2025-05-07 13:06:36,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:06:36,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -14.37040 ± 10.696
2025-05-07 13:06:36,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-15.363164, -4.905812, -2.2322335, -34.3793, -4.369283, -5.5694733, -21.687477, -8.70178, -29.768661, -16.726782]
2025-05-07 13:06:36,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [23.0, 30.0, 21.0, 29.0, 27.0, 25.0, 45.0, 17.0, 74.0, 41.0]
2025-05-07 13:06:36,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 43 minutes, 31 seconds)
2025-05-07 13:09:05,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:09:06,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -25.21801 ± 23.656
2025-05-07 13:09:06,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-21.190136, 0.82009006, -7.573369, -57.41024, -1.0994607, -51.03048, -33.01518, -6.1982765, -9.722045, -65.76098]
2025-05-07 13:09:06,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [35.0, 32.0, 33.0, 149.0, 17.0, 43.0, 87.0, 27.0, 33.0, 119.0]
2025-05-07 13:09:06,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 36 minutes, 1 second)
2025-05-07 13:11:43,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:11:44,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -47.29443 ± 42.402
2025-05-07 13:11:44,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-10.289061, -16.982607, -63.44651, -96.88767, 5.5234694, -24.013733, -86.593506, -65.5318, -118.84045, 4.1176043]
2025-05-07 13:11:44,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [61.0, 20.0, 85.0, 171.0, 62.0, 18.0, 102.0, 116.0, 129.0, 46.0]
2025-05-07 13:11:44,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 34 minutes, 52 seconds)
2025-05-07 13:14:24,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:14:25,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -52.25130 ± 48.481
2025-05-07 13:14:25,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-72.592575, -114.68343, -65.68427, -1.9636765, -19.047129, -12.534916, -24.123095, -14.090846, -38.878605, -158.9145]
2025-05-07 13:14:25,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [80.0, 236.0, 124.0, 19.0, 42.0, 28.0, 66.0, 56.0, 49.0, 180.0]
2025-05-07 13:14:25,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 30 minutes, 47 seconds)
2025-05-07 13:17:14,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:17:15,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -20.29545 ± 20.260
2025-05-07 13:17:15,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [4.9588556, -8.777874, -23.161404, -16.287638, -49.92414, -6.679655, 7.241027, -41.706448, -16.45927, -52.157948]
2025-05-07 13:17:15,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [14.0, 20.0, 41.0, 31.0, 69.0, 41.0, 28.0, 86.0, 50.0, 85.0]
2025-05-07 13:17:15,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 31 minutes, 13 seconds)
2025-05-07 13:19:44,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:19:45,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -35.73128 ± 44.716
2025-05-07 13:19:45,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-40.950268, -25.27099, -149.98076, 1.0652996, -69.16928, -12.888033, -3.972774, -54.12525, -6.1808405, 4.1601105]
2025-05-07 13:19:45,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [50.0, 80.0, 172.0, 18.0, 193.0, 18.0, 37.0, 79.0, 24.0, 18.0]
2025-05-07 13:19:45,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 24 minutes, 40 seconds)
2025-05-07 13:22:23,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:22:26,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -104.11150 ± 190.389
2025-05-07 13:22:26,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-633.2168, -20.846546, -5.987031, -46.550835, -17.250025, -250.3972, -39.229065, 4.573749, 17.022646, -49.233986]
2025-05-07 13:22:26,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 72.0, 16.0, 48.0, 39.0, 372.0, 111.0, 27.0, 27.0, 76.0]
2025-05-07 13:22:26,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 24 minutes, 5 seconds)
2025-05-07 13:25:20,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:25:23,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -102.18803 ± 161.612
2025-05-07 13:25:23,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-15.677105, -10.882955, -111.88514, -117.194046, -568.3773, -80.58296, 2.1392863, -93.48829, 0.5436736, -26.475433]
2025-05-07 13:25:23,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [32.0, 20.0, 132.0, 138.0, 1000.0, 107.0, 20.0, 87.0, 16.0, 87.0]
2025-05-07 13:25:23,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 24 minutes, 41 seconds)
2025-05-07 13:28:00,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:28:00,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -32.19131 ± 42.219
2025-05-07 13:28:00,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-11.313259, -98.197365, -27.283524, 7.6739383, -128.65013, -17.343859, -18.64772, -5.064919, -3.7513793, -19.334879]
2025-05-07 13:28:00,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [43.0, 175.0, 50.0, 14.0, 135.0, 51.0, 75.0, 24.0, 17.0, 25.0]
2025-05-07 13:28:00,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 21 minutes, 23 seconds)
2025-05-07 13:30:32,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:30:34,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -93.39187 ± 183.165
2025-05-07 13:30:34,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-95.121086, -24.38154, -4.423825, -116.77243, -22.393286, 7.871123, -629.8692, -44.172073, -11.181094, 6.524733]
2025-05-07 13:30:34,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [106.0, 24.0, 19.0, 217.0, 26.0, 80.0, 1000.0, 137.0, 38.0, 22.0]
2025-05-07 13:30:34,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 15 minutes, 56 seconds)
2025-05-07 13:33:10,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:33:11,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -45.94301 ± 79.977
2025-05-07 13:33:11,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-50.23565, -278.84268, -56.0756, -15.035398, -29.622778, -17.422253, -5.9170914, -12.506856, 7.126879, -0.898689]
2025-05-07 13:33:11,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [40.0, 240.0, 77.0, 46.0, 54.0, 54.0, 20.0, 24.0, 35.0, 17.0]
2025-05-07 13:33:11,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 14 minutes, 18 seconds)
2025-05-07 13:35:48,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:35:49,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -12.07010 ± 29.023
2025-05-07 13:35:49,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-1.6647699, -85.95682, -5.115229, -21.742466, 3.6488402, -29.195023, 31.46209, -11.398415, 1.06665, -1.8058089]
2025-05-07 13:35:49,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [39.0, 78.0, 23.0, 22.0, 42.0, 86.0, 43.0, 22.0, 18.0, 17.0]
2025-05-07 13:35:49,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 11 minutes, 7 seconds)
2025-05-07 13:38:33,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:38:38,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -181.10233 ± 253.217
2025-05-07 13:38:38,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-75.52682, -50.736897, -561.671, -42.265945, -17.993942, -32.304306, -785.4841, -126.509155, -25.750587, -92.780556]
2025-05-07 13:38:38,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [121.0, 109.0, 1000.0, 49.0, 56.0, 49.0, 1000.0, 137.0, 54.0, 165.0]
2025-05-07 13:38:38,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 7 minutes, 14 seconds)
2025-05-07 13:41:26,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:41:29,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -137.58812 ± 355.787
2025-05-07 13:41:29,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-5.2115607, -82.31189, -44.967644, -21.26749, -1202.1886, 3.6614225, -8.4492035, -22.368837, 2.7375221, 4.484951]
2025-05-07 13:41:29,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [62.0, 106.0, 60.0, 56.0, 1000.0, 15.0, 18.0, 34.0, 23.0, 70.0]
2025-05-07 13:41:29,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 6 minutes, 39 seconds)
2025-05-07 13:43:59,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:44:02,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -103.45943 ± 185.960
2025-05-07 13:44:02,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-37.14479, -65.45626, -34.380424, -12.018142, -2.071202, -657.19635, -39.110386, -69.70623, -76.840385, -40.670063]
2025-05-07 13:44:02,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [47.0, 116.0, 32.0, 30.0, 25.0, 1000.0, 48.0, 53.0, 81.0, 51.0]
2025-05-07 13:44:02,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 3 minutes, 46 seconds)
2025-05-07 13:46:33,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:46:34,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -16.68544 ± 17.986
2025-05-07 13:46:34,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-13.086305, -40.2702, -26.914217, -33.596493, -3.1251245, 7.647404, -47.47732, -5.664061, -0.06372067, -4.30433]
2025-05-07 13:46:34,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [37.0, 24.0, 28.0, 32.0, 34.0, 25.0, 50.0, 46.0, 14.0, 20.0]
2025-05-07 13:46:34,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 24 seconds)
2025-05-07 13:49:20,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:49:24,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -236.80493 ± 454.783
2025-05-07 13:49:24,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-43.956028, -771.98254, -29.836586, -32.45186, -1.1282777, 2.972586, -7.1783657, -1423.383, -55.664295, -5.4408903]
2025-05-07 13:49:24,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [64.0, 1000.0, 58.0, 55.0, 43.0, 28.0, 16.0, 1000.0, 62.0, 24.0]
2025-05-07 13:49:24,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 59 minutes, 34 seconds)
2025-05-07 13:52:05,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:52:06,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -20.92003 ± 30.199
2025-05-07 13:52:06,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [39.639153, -28.819061, -8.206462, -79.54566, -11.335062, -34.060272, 10.03171, -30.688698, -23.713388, -42.50257]
2025-05-07 13:52:06,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [136.0, 58.0, 53.0, 86.0, 18.0, 30.0, 32.0, 30.0, 50.0, 42.0]
2025-05-07 13:52:06,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 55 minutes, 47 seconds)
2025-05-07 13:54:46,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:54:50,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -138.68483 ± 249.775
2025-05-07 13:54:50,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [7.061495, -72.64574, -533.4289, 4.1084332, -29.087122, -723.6818, 7.4263396, -32.34942, 9.518483, -23.770067]
2025-05-07 13:54:50,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [47.0, 91.0, 1000.0, 29.0, 109.0, 1000.0, 49.0, 24.0, 30.0, 33.0]
2025-05-07 13:54:50,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 52 minutes, 9 seconds)
2025-05-07 13:57:25,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:57:28,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -103.25234 ± 197.427
2025-05-07 13:57:28,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-25.22194, -19.001287, -0.4862299, -40.2235, -4.2799973, -21.61777, 1.2075804, -113.72862, -128.02861, -681.14307]
2025-05-07 13:57:28,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [34.0, 49.0, 30.0, 84.0, 41.0, 28.0, 24.0, 124.0, 99.0, 1000.0]
2025-05-07 13:57:28,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 50 minutes, 8 seconds)
2025-05-07 14:00:08,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:00:11,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -112.57572 ± 292.691
2025-05-07 14:00:11,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-989.61096, -43.12735, -6.9522347, -1.2889035, -14.230978, -4.593788, -5.569798, -34.790764, 1.0368624, -26.629423]
2025-05-07 14:00:11,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 47.0, 24.0, 58.0, 60.0, 26.0, 35.0, 64.0, 13.0, 32.0]
2025-05-07 14:00:11,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 48 minutes, 55 seconds)
2025-05-07 14:02:43,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:02:47,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -165.74614 ± 289.833
2025-05-07 14:02:47,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-26.04256, -79.69301, -26.540674, 5.6253276, -23.487286, -730.6466, -30.520971, -756.04724, 5.837468, 4.0540576]
2025-05-07 14:02:47,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [41.0, 90.0, 46.0, 14.0, 39.0, 1000.0, 63.0, 1000.0, 24.0, 52.0]
2025-05-07 14:02:47,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 44 minutes, 18 seconds)
2025-05-07 14:05:26,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:05:27,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -40.82285 ± 25.869
2025-05-07 14:05:27,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-66.81327, -25.942104, -48.166473, -85.5588, -28.4563, -21.127405, -66.27121, -53.1364, -2.7139745, -10.042554]
2025-05-07 14:05:27,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [58.0, 33.0, 50.0, 67.0, 73.0, 34.0, 139.0, 58.0, 49.0, 59.0]
2025-05-07 14:05:27,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 41 minutes, 26 seconds)
2025-05-07 14:08:14,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:08:20,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -209.85149 ± 287.246
2025-05-07 14:08:20,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-614.962, -75.04806, -756.125, 29.689487, -16.853334, -17.434223, -73.324066, -6.7628617, -17.47309, -550.2217]
2025-05-07 14:08:20,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 65.0, 1000.0, 39.0, 56.0, 70.0, 55.0, 108.0, 47.0, 1000.0]
2025-05-07 14:08:20,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 39 minutes, 50 seconds)
2025-05-07 14:10:54,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:10:57,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -129.37831 ± 204.439
2025-05-07 14:10:57,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [2.4384117, -5.1729164, -56.719223, -23.342993, -99.51338, -185.51373, -703.9477, -203.58612, -15.085588, -3.3398728]
2025-05-07 14:10:57,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [24.0, 32.0, 318.0, 34.0, 107.0, 235.0, 1000.0, 212.0, 27.0, 27.0]
2025-05-07 14:10:57,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 37 minutes, 10 seconds)
2025-05-07 14:13:35,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:13:35,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -37.99015 ± 51.624
2025-05-07 14:13:35,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [51.412453, -24.746622, -135.49834, -13.702767, -72.75606, 5.832074, -38.73128, -7.7836084, -40.071854, -103.85547]
2025-05-07 14:13:35,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [90.0, 27.0, 105.0, 64.0, 54.0, 17.0, 81.0, 16.0, 68.0, 117.0]
2025-05-07 14:13:35,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 33 minutes, 53 seconds)
2025-05-07 14:16:17,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:16:19,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -73.54453 ± 163.565
2025-05-07 14:16:19,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-81.222145, -10.240328, -19.470879, -16.636591, -11.065192, -5.807406, -3.2040992, -559.461, 4.551372, -32.889038]
2025-05-07 14:16:19,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [143.0, 69.0, 32.0, 63.0, 27.0, 16.0, 24.0, 1000.0, 53.0, 69.0]
2025-05-07 14:16:19,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 32 minutes, 5 seconds)
2025-05-07 14:19:10,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:19:14,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -165.73622 ± 299.456
2025-05-07 14:19:14,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [2.9882274, -38.803684, -20.372654, -595.93365, -14.596726, -2.6746163, -35.722004, -900.4157, -6.304287, -45.527077]
2025-05-07 14:19:14,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [27.0, 54.0, 88.0, 1000.0, 36.0, 28.0, 39.0, 1000.0, 23.0, 92.0]
2025-05-07 14:19:14,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 30 minutes, 59 seconds)
2025-05-07 14:21:47,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:21:48,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -37.71453 ± 45.280
2025-05-07 14:21:48,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [2.4397779, -1.6662446, -32.871796, -23.143852, 7.9723706, -86.461044, -79.92904, -3.0523338, -136.32147, -24.111671]
2025-05-07 14:21:48,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [24.0, 27.0, 45.0, 37.0, 30.0, 95.0, 117.0, 34.0, 153.0, 74.0]
2025-05-07 14:21:48,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 26 minutes, 13 seconds)
2025-05-07 14:24:28,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:24:29,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -44.14674 ± 44.018
2025-05-07 14:24:29,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [1.1987467, -75.56407, -33.156742, -47.107517, -134.22221, -31.873085, -88.775444, 26.141075, -12.457133, -45.650978]
2025-05-07 14:24:29,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [21.0, 76.0, 40.0, 54.0, 221.0, 95.0, 132.0, 46.0, 18.0, 85.0]
2025-05-07 14:24:29,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 23 minutes, 53 seconds)
2025-05-07 14:27:04,176 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:27:06,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -85.35387 ± 165.165
2025-05-07 14:27:06,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-12.023441, 3.666075, 17.457819, 5.990908, -65.75865, 2.4023118, -9.213449, -552.4631, -175.88278, -67.71442]
2025-05-07 14:27:06,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [26.0, 52.0, 79.0, 23.0, 121.0, 84.0, 42.0, 1000.0, 188.0, 69.0]
2025-05-07 14:27:06,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 21 minutes, 5 seconds)
2025-05-07 14:30:01,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:30:03,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -103.92368 ± 197.796
2025-05-07 14:30:03,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-1.6104338, -29.433765, -45.431614, -161.74777, 7.7789826, -3.0659661, 10.32547, -677.5584, -43.876286, -94.61705]
2025-05-07 14:30:03,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [15.0, 53.0, 47.0, 185.0, 88.0, 24.0, 24.0, 1000.0, 41.0, 72.0]
2025-05-07 14:30:03,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 19 minutes, 40 seconds)
2025-05-07 14:32:29,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:32:33,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -254.80046 ± 455.466
2025-05-07 14:32:33,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-1166.3843, -1.2679831, -60.589535, -77.38682, -26.009985, -6.162102, -1162.6669, -2.925429, -15.385504, -29.22599]
2025-05-07 14:32:33,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 26.0, 51.0, 73.0, 55.0, 15.0, 1000.0, 58.0, 66.0, 87.0]
2025-05-07 14:32:33,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 14 minutes, 34 seconds)
2025-05-07 14:35:17,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:35:19,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -121.15572 ± 278.518
2025-05-07 14:35:19,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-953.6153, -45.25673, -36.21986, -23.589891, 1.3145083, -5.998158, -25.647697, -40.383892, 0.80513954, -82.965416]
2025-05-07 14:35:19,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 49.0, 34.0, 28.0, 23.0, 20.0, 68.0, 54.0, 28.0, 74.0]
2025-05-07 14:35:19,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 12 minutes, 58 seconds)
2025-05-07 14:38:05,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:38:07,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -80.95639 ± 70.474
2025-05-07 14:38:07,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-5.999906, -30.397453, -103.12481, -43.620457, -73.58747, -4.194347, -78.46299, -176.08765, -57.973312, -236.11546]
2025-05-07 14:38:07,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [40.0, 52.0, 82.0, 59.0, 60.0, 52.0, 134.0, 134.0, 71.0, 163.0]
2025-05-07 14:38:07,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 10 minutes, 50 seconds)
2025-05-07 14:40:36,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:40:39,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -172.51419 ± 410.421
2025-05-07 14:40:39,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [12.864636, -21.427244, -219.12764, -20.256899, 5.138004, -5.967153, -32.03263, -1389.386, -30.526215, -24.420887]
2025-05-07 14:40:39,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [36.0, 50.0, 156.0, 87.0, 69.0, 21.0, 71.0, 1000.0, 36.0, 38.0]
2025-05-07 14:40:39,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 7 minutes, 42 seconds)
2025-05-07 14:43:34,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:43:36,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -106.95531 ± 274.632
2025-05-07 14:43:36,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-22.137682, -60.937607, -30.903746, -5.065695, -0.086207196, -0.22477873, -24.007492, -10.781475, -928.7459, 13.337481]
2025-05-07 14:43:36,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [26.0, 56.0, 32.0, 32.0, 16.0, 32.0, 44.0, 13.0, 1000.0, 34.0]
2025-05-07 14:43:36,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 5 minutes, 2 seconds)
2025-05-07 14:46:01,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:46:02,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -15.40065 ± 15.056
2025-05-07 14:46:02,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-9.254301, -20.303432, -28.866714, -4.4539804, -27.329489, -43.18092, 6.064868, -5.394307, 3.2072315, -24.495512]
2025-05-07 14:46:02,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [29.0, 25.0, 23.0, 86.0, 24.0, 36.0, 22.0, 14.0, 63.0, 33.0]
2025-05-07 14:46:02,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 2 minutes)
2025-05-07 14:48:41,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:48:42,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -32.31411 ± 39.467
2025-05-07 14:48:42,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-6.6240954, -116.542336, -45.619354, -6.5475364, -11.195405, -36.886906, -6.8791394, 3.3611143, -93.35988, -2.8475296]
2025-05-07 14:48:42,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [64.0, 54.0, 38.0, 45.0, 24.0, 93.0, 21.0, 22.0, 96.0, 30.0]
2025-05-07 14:48:42,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 79/100 (estimated time remaining: 58 minutes, 53 seconds)
2025-05-07 14:51:25,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:51:28,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -96.96703 ± 178.562
2025-05-07 14:51:28,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-21.482494, -80.7932, -40.029335, -10.372132, -83.25987, 12.748239, -58.83211, -58.7035, -624.6197, -4.3262215]
2025-05-07 14:51:28,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [61.0, 73.0, 45.0, 30.0, 103.0, 91.0, 96.0, 43.0, 1000.0, 21.0]
2025-05-07 14:51:28,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 80/100 (estimated time remaining: 56 minutes, 4 seconds)
2025-05-07 14:54:06,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:54:07,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -43.55327 ± 45.767
2025-05-07 14:54:07,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-69.70987, -4.2489424, -74.48582, -91.157906, 5.3229885, -141.69484, -10.521539, -6.0574245, -22.62609, -20.35325]
2025-05-07 14:54:07,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [65.0, 19.0, 62.0, 93.0, 34.0, 155.0, 33.0, 15.0, 51.0, 39.0]
2025-05-07 14:54:07,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 81/100 (estimated time remaining: 53 minutes, 51 seconds)
2025-05-07 14:57:01,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:57:03,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -77.55016 ± 146.273
2025-05-07 14:57:03,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [0.73331183, -92.74579, 4.386686, -11.547661, -55.847374, 11.910614, -496.63995, -122.65713, -18.817854, 5.7236032]
2025-05-07 14:57:03,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [27.0, 61.0, 76.0, 42.0, 78.0, 35.0, 1000.0, 111.0, 35.0, 16.0]
2025-05-07 14:57:03,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 82/100 (estimated time remaining: 51 minutes, 6 seconds)
2025-05-07 14:59:48,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:59:52,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -191.17189 ± 262.721
2025-05-07 14:59:52,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-46.998356, -24.64155, -685.2715, -37.815056, -13.829743, -712.1595, -260.42398, -31.719591, -29.772907, -69.08667]
2025-05-07 14:59:52,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [45.0, 51.0, 1000.0, 47.0, 50.0, 1000.0, 286.0, 72.0, 59.0, 66.0]
2025-05-07 14:59:52,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 83/100 (estimated time remaining: 49 minutes, 49 seconds)
2025-05-07 15:02:25,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:02:26,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -66.38214 ± 75.755
2025-05-07 15:02:26,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-6.31434, -40.73688, -41.045216, -1.9288347, -8.196539, -222.30269, -198.80566, -11.340098, -72.240814, -60.910275]
2025-05-07 15:02:26,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [28.0, 82.0, 39.0, 40.0, 35.0, 155.0, 230.0, 18.0, 96.0, 40.0]
2025-05-07 15:02:26,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 84/100 (estimated time remaining: 46 minutes, 42 seconds)
2025-05-07 15:05:07,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:05:07,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -37.90388 ± 36.667
2025-05-07 15:05:07,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-6.131644, -12.138321, -38.025463, -22.312141, -26.25822, -43.5152, -50.397118, -136.80211, -42.955956, -0.5026624]
2025-05-07 15:05:07,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [28.0, 73.0, 71.0, 34.0, 56.0, 50.0, 57.0, 122.0, 40.0, 20.0]
2025-05-07 15:05:07,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 85/100 (estimated time remaining: 43 minutes, 43 seconds)
2025-05-07 15:08:00,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:08:03,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -111.19456 ± 183.765
2025-05-07 15:08:03,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-30.767172, -26.12249, -93.36284, -44.899982, -61.2424, -22.285164, -26.666336, -93.87012, -55.352478, -657.3765]
2025-05-07 15:08:03,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [57.0, 43.0, 63.0, 44.0, 92.0, 45.0, 49.0, 85.0, 79.0, 1000.0]
2025-05-07 15:08:03,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 86/100 (estimated time remaining: 41 minutes, 48 seconds)
2025-05-07 15:10:30,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:10:31,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -63.54759 ± 44.172
2025-05-07 15:10:31,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-34.107983, -173.76425, -30.22118, -75.8693, -42.43868, -68.44146, -57.405766, -93.24266, -1.7650532, -58.21953]
2025-05-07 15:10:31,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [81.0, 140.0, 43.0, 69.0, 36.0, 102.0, 145.0, 91.0, 69.0, 76.0]
2025-05-07 15:10:31,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 87/100 (estimated time remaining: 37 minutes, 41 seconds)
2025-05-07 15:13:19,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:13:23,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -202.71660 ± 327.460
2025-05-07 15:13:23,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [5.395336, -255.9812, -18.012756, -21.30774, -898.13837, -12.206801, -30.626772, -780.0978, -23.59457, 7.40466]
2025-05-07 15:13:23,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [13.0, 338.0, 25.0, 38.0, 1000.0, 24.0, 28.0, 1000.0, 23.0, 22.0]
2025-05-07 15:13:23,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 88/100 (estimated time remaining: 35 minutes, 8 seconds)
2025-05-07 15:15:56,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:15:57,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -38.08722 ± 45.802
2025-05-07 15:15:57,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-58.545284, -5.190773, -3.5211065, -33.587532, -45.737026, -14.084111, -164.6504, -17.919283, -33.980957, -3.6557238]
2025-05-07 15:15:57,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [60.0, 33.0, 93.0, 43.0, 45.0, 45.0, 122.0, 36.0, 44.0, 29.0]
2025-05-07 15:15:57,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 89/100 (estimated time remaining: 32 minutes, 24 seconds)
2025-05-07 15:18:35,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:18:36,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -35.40533 ± 47.377
2025-05-07 15:18:36,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-11.615236, -12.262928, -0.5363073, -21.091442, -123.456985, -33.867935, -11.870314, 19.752756, -127.89441, -31.210531]
2025-05-07 15:18:36,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [25.0, 29.0, 27.0, 44.0, 146.0, 81.0, 34.0, 35.0, 164.0, 54.0]
2025-05-07 15:18:36,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 90/100 (estimated time remaining: 29 minutes, 37 seconds)
2025-05-07 15:21:14,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:21:18,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -185.64804 ± 323.367
2025-05-07 15:21:18,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [35.496155, -28.620514, -7.112869, -904.3705, 7.449043, -5.862809, 20.838919, -120.09672, -735.7443, -118.456764]
2025-05-07 15:21:18,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [37.0, 99.0, 34.0, 1000.0, 17.0, 20.0, 34.0, 43.0, 1000.0, 168.0]
2025-05-07 15:21:18,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 30 seconds)
2025-05-07 15:24:04,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:24:05,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -51.93559 ± 51.879
2025-05-07 15:24:05,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-17.459927, -8.966448, -75.40683, -74.13449, -16.740854, -3.4740608, -148.57701, -138.46017, -10.040602, -26.095428]
2025-05-07 15:24:05,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [26.0, 35.0, 65.0, 85.0, 39.0, 24.0, 166.0, 110.0, 26.0, 42.0]
2025-05-07 15:24:05,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 92/100 (estimated time remaining: 24 minutes, 26 seconds)
2025-05-07 15:26:40,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:26:43,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -175.61343 ± 422.909
2025-05-07 15:26:43,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-0.4212285, -104.29968, -30.969213, -43.232956, -53.21224, -50.20395, 0.95183104, -1441.2551, -10.261833, -23.229975]
2025-05-07 15:26:43,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [22.0, 83.0, 35.0, 117.0, 81.0, 71.0, 15.0, 1000.0, 16.0, 48.0]
2025-05-07 15:26:43,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 93/100 (estimated time remaining: 21 minutes, 19 seconds)
2025-05-07 15:29:26,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:29:31,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -227.30301 ± 362.147
2025-05-07 15:29:31,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-5.576325, -44.29337, -11.790073, -35.73473, -1177.067, -596.40784, -275.6098, -17.913435, -27.97123, -80.66632]
2025-05-07 15:29:31,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [17.0, 74.0, 12.0, 42.0, 1000.0, 1000.0, 364.0, 26.0, 55.0, 61.0]
2025-05-07 15:29:31,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 94/100 (estimated time remaining: 19 minutes)
2025-05-07 15:32:06,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:32:09,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -112.65955 ± 194.501
2025-05-07 15:32:09,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-9.541528, -112.40782, -202.86137, -19.272806, 6.699863, 5.3171763, -41.879406, -59.204773, -26.401999, -667.04285]
2025-05-07 15:32:09,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [49.0, 159.0, 200.0, 42.0, 15.0, 29.0, 104.0, 64.0, 44.0, 1000.0]
2025-05-07 15:32:09,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 15 seconds)
2025-05-07 15:34:47,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:34:49,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -139.10452 ± 271.305
2025-05-07 15:34:49,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-28.103485, -941.7706, -99.45448, -118.038345, -30.273617, -21.525183, 9.595512, -16.329933, -126.16593, -18.979286]
2025-05-07 15:34:49,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [23.0, 907.0, 78.0, 77.0, 46.0, 33.0, 29.0, 50.0, 90.0, 36.0]
2025-05-07 15:34:49,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 30 seconds)
2025-05-07 15:37:39,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:37:40,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -64.55687 ± 49.374
2025-05-07 15:37:40,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-28.455114, -150.69896, -37.41337, -65.529335, 4.1773214, -20.592731, -120.63023, -40.894623, -54.27464, -131.25703]
2025-05-07 15:37:40,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [41.0, 149.0, 77.0, 71.0, 20.0, 23.0, 119.0, 46.0, 53.0, 70.0]
2025-05-07 15:37:40,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 51 seconds)
2025-05-07 15:40:11,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:40:12,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -50.56895 ± 51.423
2025-05-07 15:40:12,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-73.52804, -37.482307, -13.472369, -65.77809, -14.092404, -178.08301, -88.79394, -29.722506, -3.8677561, -0.8690382]
2025-05-07 15:40:12,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [57.0, 50.0, 47.0, 111.0, 31.0, 179.0, 76.0, 78.0, 26.0, 24.0]
2025-05-07 15:40:12,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 5 seconds)
2025-05-07 15:42:50,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:42:50,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -45.31161 ± 48.005
2025-05-07 15:42:50,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-164.12572, -61.610672, -19.988247, -34.81856, -94.03802, -24.758814, -19.754011, -37.142857, 1.8517001, 1.2690693]
2025-05-07 15:42:50,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [148.0, 65.0, 32.0, 71.0, 78.0, 42.0, 24.0, 32.0, 13.0, 17.0]
2025-05-07 15:42:50,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 19 seconds)
2025-05-07 15:45:32,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:45:36,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -203.72246 ± 317.215
2025-05-07 15:45:36,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-816.7758, -69.95258, 5.292585, -31.486813, -164.17813, -37.845, -15.306785, -847.1183, -19.316118, -40.537563]
2025-05-07 15:45:36,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 98.0, 30.0, 55.0, 101.0, 36.0, 35.0, 1000.0, 60.0, 27.0]
2025-05-07 15:45:36,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 41 seconds)
2025-05-07 15:48:13,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:48:15,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -116.37441 ± 215.571
2025-05-07 15:48:15,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [0.4105921, -137.9687, -6.568614, -71.56206, -35.692753, -21.042767, -15.958241, -20.69158, -104.60559, -750.06445]
2025-05-07 15:48:15,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [16.0, 97.0, 26.0, 59.0, 35.0, 37.0, 29.0, 28.0, 70.0, 1000.0]
2025-05-07 15:48:15,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1149 [DEBUG]: Training session finished
