2025-05-10 22:03:54,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2
2025-05-10 22:03:54,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2
2025-05-10 22:03:54,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7ef8401cc3d0>}
2025-05-10 22:03:54,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1111 [DEBUG]: using device: cpu
2025-05-10 22:03:54,392 baseline-sac-noisy-ant:77 [WARNING]: args.memorize_actions != args.horizon: 2 != 24
2025-05-10 22:03:54,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-10 22:03:54,413 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=43, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-10 22:03:54,413 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=51, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 22:03:54,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-10 22:03:54,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-10 22:06:50,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:06:51,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -101.90308 ± 64.093
2025-05-10 22:06:51,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-53.695335, -17.477665, -183.9143, -221.04651, -93.742195, -168.09601, -99.45781, -66.28844, -34.54269, -80.7699]
2025-05-10 22:06:51,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 16.0, 109.0, 133.0, 53.0, 127.0, 53.0, 29.0, 21.0, 37.0]
2025-05-10 22:06:51,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-101.90) for latency ExtremeClogL1U23
2025-05-10 22:06:51,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:06:51,240 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:06:51,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 51 minutes, 22 seconds)
2025-05-10 22:09:35,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:09:38,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -41.49546 ± 87.798
2025-05-10 22:09:38,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [9.811644, -6.268709, -33.054832, -18.485588, 9.762588, 20.119373, -92.50446, -288.82397, 0.52951944, -16.040113]
2025-05-10 22:09:38,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [15.0, 13.0, 124.0, 30.0, 35.0, 60.0, 233.0, 1000.0, 36.0, 147.0]
2025-05-10 22:09:38,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-41.50) for latency ExtremeClogL1U23
2025-05-10 22:09:38,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:09:38,377 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:09:38,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 40 minutes, 42 seconds)
2025-05-10 22:12:28,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:12:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -15.41353 ± 63.117
2025-05-10 22:12:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [6.861455, 2.1861374, 22.474005, -8.346157, -6.3254223, 8.6958885, 45.239788, -12.222985, -197.60403, -15.094027]
2025-05-10 22:12:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 38.0, 71.0, 133.0, 26.0, 112.0, 212.0, 72.0, 1000.0, 66.0]
2025-05-10 22:12:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-15.41) for latency ExtremeClogL1U23
2025-05-10 22:12:31,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:12:31,094 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:12:31,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 38 minutes, 18 seconds)
2025-05-10 22:15:54,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:16:08,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 201.94397 ± 121.377
2025-05-10 22:16:08,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [349.0966, 7.8151207, 326.47702, 197.10272, 289.2985, 15.143616, 324.08047, 247.6583, 169.8612, 92.90609]
2025-05-10 22:16:08,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 67.0, 1000.0, 1000.0, 556.0, 42.0, 1000.0, 1000.0, 393.0, 295.0]
2025-05-10 22:16:08,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (201.94) for latency ExtremeClogL1U23
2025-05-10 22:16:08,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:16:08,673 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:16:08,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 53 minutes, 37 seconds)
2025-05-10 22:20:02,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:20:13,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 153.46457 ± 75.713
2025-05-10 22:20:13,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [188.34955, 270.03586, 187.68654, 107.09163, 169.31921, 56.83307, 59.713066, 194.11525, 248.60509, 52.896397]
2025-05-10 22:20:13,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [625.0, 949.0, 605.0, 250.0, 1000.0, 159.0, 410.0, 482.0, 629.0, 143.0]
2025-05-10 22:20:13,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 9 minutes, 57 seconds)
2025-05-10 22:23:17,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:23:26,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 140.73381 ± 69.682
2025-05-10 22:23:26,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [190.82971, 206.5343, 96.945366, 11.075282, 194.60446, 176.79631, 49.8019, 84.06576, 216.61705, 180.06798]
2025-05-10 22:23:26,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [385.0, 509.0, 1000.0, 97.0, 1000.0, 429.0, 234.0, 137.0, 604.0, 1000.0]
2025-05-10 22:23:26,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 11 minutes, 57 seconds)
2025-05-10 22:26:28,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:26:35,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 110.77586 ± 87.683
2025-05-10 22:26:35,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [123.33227, 70.99249, 127.31346, 70.25527, 128.33205, 43.108112, 339.51807, 50.60274, 5.967808, 148.33624]
2025-05-10 22:26:35,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [445.0, 168.0, 481.0, 131.0, 232.0, 95.0, 1000.0, 1000.0, 21.0, 1000.0]
2025-05-10 22:26:35,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 15 minutes, 25 seconds)
2025-05-10 22:29:33,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:29:43,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 187.40472 ± 96.395
2025-05-10 22:29:43,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [163.7732, 186.349, 76.83128, 343.7596, 113.852745, 195.3134, 142.8552, 71.89554, 206.17844, 373.23886]
2025-05-10 22:29:43,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 376.0, 182.0, 1000.0, 294.0, 1000.0, 330.0, 125.0, 443.0, 980.0]
2025-05-10 22:29:43,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 16 minutes, 31 seconds)
2025-05-10 22:32:45,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:32:56,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 262.57916 ± 134.011
2025-05-10 22:32:56,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [83.52842, 241.585, 397.20306, 467.93365, 201.29756, 234.48418, 22.758488, 238.48233, 395.65607, 342.86282]
2025-05-10 22:32:56,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [123.0, 1000.0, 721.0, 854.0, 339.0, 402.0, 27.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:32:56,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (262.58) for latency ExtremeClogL1U23
2025-05-10 22:32:56,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:32:56,056 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:32:56,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 5 minutes, 33 seconds)
2025-05-10 22:35:47,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:35:59,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 281.32507 ± 118.979
2025-05-10 22:35:59,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [155.52795, 394.09753, 249.17017, 480.23407, 435.97443, 151.21988, 369.33652, 193.64388, 182.21698, 201.82907]
2025-05-10 22:35:59,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [279.0, 1000.0, 1000.0, 1000.0, 1000.0, 192.0, 597.0, 1000.0, 477.0, 367.0]
2025-05-10 22:35:59,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (281.33) for latency ExtremeClogL1U23
2025-05-10 22:35:59,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:35:59,736 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:35:59,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 43 minutes, 52 seconds)
2025-05-10 22:38:47,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:38:56,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 243.64522 ± 181.652
2025-05-10 22:38:56,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [148.62596, 80.24988, 509.80457, 155.45976, 50.74397, 125.822784, 442.74698, 233.35826, 573.20404, 116.435875]
2025-05-10 22:38:56,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [238.0, 140.0, 1000.0, 491.0, 62.0, 248.0, 1000.0, 345.0, 1000.0, 230.0]
2025-05-10 22:38:56,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 35 minutes, 42 seconds)
2025-05-10 22:41:55,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:42:13,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 403.55408 ± 145.232
2025-05-10 22:42:13,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [463.63687, 587.07825, 338.02557, 557.66345, 46.126686, 305.8864, 394.07343, 463.577, 408.22812, 471.24536]
2025-05-10 22:42:13,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 53.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:42:13,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (403.55) for latency ExtremeClogL1U23
2025-05-10 22:42:13,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:42:13,089 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:42:13,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 34 minutes, 54 seconds)
2025-05-10 22:45:25,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:45:42,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 568.49884 ± 168.869
2025-05-10 22:45:42,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [611.84045, 547.99475, 678.9353, 727.00696, 670.9489, 468.5189, 525.3324, 117.81325, 659.4871, 677.11035]
2025-05-10 22:45:42,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 175.0, 1000.0, 1000.0]
2025-05-10 22:45:42,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (568.50) for latency ExtremeClogL1U23
2025-05-10 22:45:42,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:45:42,043 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:45:42,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 38 minutes, 3 seconds)
2025-05-10 22:48:27,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:48:40,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 550.88800 ± 217.801
2025-05-10 22:48:40,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [185.06146, 673.9567, 863.65424, 859.8659, 454.79916, 331.90793, 389.8374, 719.3825, 420.21515, 610.19934]
2025-05-10 22:48:40,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [316.0, 1000.0, 1000.0, 1000.0, 614.0, 455.0, 435.0, 1000.0, 515.0, 1000.0]
2025-05-10 22:48:40,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 30 minutes, 51 seconds)
2025-05-10 22:51:44,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:51:58,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 624.43005 ± 181.164
2025-05-10 22:51:58,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [113.41691, 593.09357, 714.5015, 774.8391, 729.83813, 746.6125, 711.2076, 653.972, 599.8169, 607.002]
2025-05-10 22:51:58,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [106.0, 652.0, 1000.0, 842.0, 1000.0, 1000.0, 1000.0, 732.0, 723.0, 1000.0]
2025-05-10 22:51:58,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (624.43) for latency ExtremeClogL1U23
2025-05-10 22:51:58,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:51:58,615 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:51:58,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 31 minutes, 40 seconds)
2025-05-10 22:55:08,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:55:27,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 732.39563 ± 146.242
2025-05-10 22:55:27,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [511.30417, 607.75037, 652.31757, 543.89813, 857.6189, 948.78467, 840.6358, 809.9394, 884.23517, 667.47156]
2025-05-10 22:55:27,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:55:27,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (732.40) for latency ExtremeClogL1U23
2025-05-10 22:55:27,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:55:27,786 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 22:55:27,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 37 minutes, 38 seconds)
2025-05-10 22:58:21,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 22:58:39,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 730.31042 ± 258.095
2025-05-10 22:58:39,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [660.35175, 152.8751, 838.98334, 1036.8826, 839.61127, 882.3048, 923.12616, 358.7474, 821.41754, 788.8041]
2025-05-10 22:58:39,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 174.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:58:39,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 32 minutes, 51 seconds)
2025-05-10 23:01:28,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:01:44,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 645.00739 ± 269.563
2025-05-10 23:01:44,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [675.97266, 810.7574, 844.22705, 792.56604, 151.58955, 878.1848, 157.29573, 917.1675, 722.2688, 500.04443]
2025-05-10 23:01:44,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 157.0, 1000.0, 161.0, 1000.0, 877.0, 1000.0]
2025-05-10 23:01:44,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 23 minutes, 5 seconds)
2025-05-10 23:04:49,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:05:04,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 557.60352 ± 264.807
2025-05-10 23:05:04,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [325.66284, 705.8043, 732.1357, 45.732506, 712.89685, 694.8097, 786.6482, 691.4439, 130.42726, 750.4742]
2025-05-10 23:05:04,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [274.0, 1000.0, 1000.0, 42.0, 1000.0, 1000.0, 1000.0, 1000.0, 116.0, 1000.0]
2025-05-10 23:05:04,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 25 minutes, 32 seconds)
2025-05-10 23:07:58,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:08:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 699.81647 ± 225.914
2025-05-10 23:08:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [771.33, 729.9446, 832.4717, 707.6786, 731.1886, 891.3178, 699.5806, 833.6999, 755.37115, 45.581505]
2025-05-10 23:08:15,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 54.0]
2025-05-10 23:08:15,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 20 minutes, 32 seconds)
2025-05-10 23:11:31,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:11:51,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 762.85315 ± 117.316
2025-05-10 23:11:51,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [696.49915, 811.7178, 778.6844, 740.75037, 862.8143, 793.1496, 935.1315, 465.92685, 747.6329, 796.2248]
2025-05-10 23:11:51,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:11:51,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (762.85) for latency ExtremeClogL1U23
2025-05-10 23:11:51,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:11:51,301 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:11:51,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 18 minutes, 59 seconds)
2025-05-10 23:14:44,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:14:59,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 588.18176 ± 217.889
2025-05-10 23:14:59,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [602.49554, 773.0474, 544.2299, 810.3074, 689.27155, 726.9864, 35.31109, 555.1425, 734.3329, 410.6924]
2025-05-10 23:14:59,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [576.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 42.0, 1000.0, 1000.0, 346.0]
2025-05-10 23:14:59,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 14 minutes, 55 seconds)
2025-05-10 23:17:56,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:18:13,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 696.44836 ± 190.137
2025-05-10 23:18:13,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [853.59875, 589.13293, 806.2864, 737.7493, 493.43054, 740.94214, 695.47205, 276.04712, 786.387, 985.43726]
2025-05-10 23:18:13,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 799.0, 1000.0, 439.0, 1000.0, 1000.0, 319.0, 1000.0, 1000.0]
2025-05-10 23:18:13,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 13 minutes, 45 seconds)
2025-05-10 23:21:15,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:21:30,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 568.21887 ± 307.651
2025-05-10 23:21:30,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [630.04047, 650.09467, 845.6176, 85.62297, 142.4833, 554.7445, 791.84033, 977.56024, 171.6579, 832.5273]
2025-05-10 23:21:30,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 80.0, 117.0, 1000.0, 1000.0, 1000.0, 142.0, 1000.0]
2025-05-10 23:21:30,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 9 minutes, 45 seconds)
2025-05-10 23:24:24,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:24:39,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 617.78442 ± 214.803
2025-05-10 23:24:39,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [825.5601, 724.1966, 704.62384, 614.4118, 796.0804, 410.91638, 671.17096, 862.52826, 143.75922, 424.5967]
2025-05-10 23:24:39,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 410.0, 1000.0, 726.0, 134.0, 373.0]
2025-05-10 23:24:39,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 6 minutes)
2025-05-10 23:27:38,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:27:54,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 696.87030 ± 378.340
2025-05-10 23:27:54,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [76.58359, 676.4171, 283.09335, 1127.9408, 1253.5631, 731.9246, 679.2003, 636.5504, 322.7009, 1180.7288]
2025-05-10 23:27:54,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [98.0, 1000.0, 222.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:27:54,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 57 minutes, 39 seconds)
2025-05-10 23:31:00,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:31:11,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 445.06924 ± 311.004
2025-05-10 23:31:11,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [94.422676, 330.03915, 1104.2354, 94.72394, 693.5693, 626.928, 657.1859, 354.8115, 401.8037, 92.97298]
2025-05-10 23:31:11,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 240.0, 1000.0, 80.0, 1000.0, 1000.0, 1000.0, 325.0, 1000.0, 72.0]
2025-05-10 23:31:11,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 56 minutes, 23 seconds)
2025-05-10 23:34:00,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:34:14,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 652.41028 ± 280.312
2025-05-10 23:34:14,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [600.672, 524.9195, 211.13487, 800.97943, 622.8463, 776.1273, 895.4284, 1041.069, 143.69237, 907.2335]
2025-05-10 23:34:14,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [492.0, 520.0, 230.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0, 1000.0]
2025-05-10 23:34:14,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 50 minutes, 46 seconds)
2025-05-10 23:37:18,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:37:26,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 438.88177 ± 372.125
2025-05-10 23:37:26,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [355.88132, 117.456474, 1139.5499, 59.778202, 328.48392, 881.04333, 309.40442, 125.6816, 922.15424, 149.38419]
2025-05-10 23:37:26,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [406.0, 114.0, 1000.0, 46.0, 345.0, 781.0, 326.0, 132.0, 1000.0, 149.0]
2025-05-10 23:37:26,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 46 minutes, 21 seconds)
2025-05-10 23:40:33,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:40:46,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 604.64630 ± 269.956
2025-05-10 23:40:46,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [666.4498, 720.72015, 1021.2932, 259.07812, 601.4894, 634.80884, 287.9708, 195.87354, 986.1359, 672.64374]
2025-05-10 23:40:46,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 813.0, 189.0, 1000.0, 536.0, 227.0, 162.0, 1000.0, 1000.0]
2025-05-10 23:40:46,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 45 minutes, 35 seconds)
2025-05-10 23:43:39,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:43:48,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 488.47192 ± 303.273
2025-05-10 23:43:48,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [753.19006, 82.210655, 198.1928, 373.00375, 804.4985, 932.0544, 662.4239, 253.31628, 101.93963, 723.8897]
2025-05-10 23:43:48,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 66.0, 165.0, 287.0, 1000.0, 1000.0, 532.0, 223.0, 64.0, 635.0]
2025-05-10 23:43:48,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 39 minutes, 27 seconds)
2025-05-10 23:46:42,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:46:49,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 399.47943 ± 268.754
2025-05-10 23:46:49,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [292.31427, 369.57095, 170.26985, 73.64148, 437.8976, 121.00213, 685.6462, 206.86795, 791.0433, 846.5409]
2025-05-10 23:46:49,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [202.0, 435.0, 116.0, 55.0, 315.0, 101.0, 1000.0, 149.0, 1000.0, 743.0]
2025-05-10 23:46:49,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 32 minutes, 37 seconds)
2025-05-10 23:49:52,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:50:03,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 657.20032 ± 388.150
2025-05-10 23:50:03,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [736.9311, 769.74866, 53.097446, 255.77126, 1317.1361, 1195.692, 705.5232, 816.59076, 233.27318, 488.23956]
2025-05-10 23:50:03,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [524.0, 671.0, 61.0, 211.0, 1000.0, 966.0, 1000.0, 1000.0, 190.0, 385.0]
2025-05-10 23:50:03,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 31 minutes, 47 seconds)
2025-05-10 23:53:06,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:53:14,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 495.04938 ± 368.293
2025-05-10 23:53:14,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [559.60626, 480.68103, 508.1819, 85.45808, 1240.2665, 197.5748, 911.80304, 750.2791, 74.368996, 142.2738]
2025-05-10 23:53:14,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [463.0, 424.0, 357.0, 76.0, 1000.0, 166.0, 1000.0, 1000.0, 92.0, 117.0]
2025-05-10 23:53:14,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 28 minutes, 37 seconds)
2025-05-10 23:56:03,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:56:18,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 929.37286 ± 317.526
2025-05-10 23:56:18,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1176.1715, 1135.7809, 145.96854, 1257.6416, 642.6728, 1103.8278, 943.2943, 797.79974, 1159.7947, 930.77637]
2025-05-10 23:56:18,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [881.0, 1000.0, 129.0, 973.0, 414.0, 1000.0, 726.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:56:18,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (929.37) for latency ExtremeClogL1U23
2025-05-10 23:56:18,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:56:18,685 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-10 23:56:18,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 21 minutes, 58 seconds)
2025-05-10 23:59:17,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-10 23:59:29,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 648.83270 ± 461.313
2025-05-10 23:59:29,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1191.5126, 86.28634, 200.3509, 33.868202, 1333.5615, 739.85297, 921.5032, 170.00227, 793.66626, 1017.7226]
2025-05-10 23:59:29,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 148.0, 137.0, 33.0, 1000.0, 1000.0, 1000.0, 123.0, 1000.0, 812.0]
2025-05-10 23:59:29,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 20 minutes, 34 seconds)
2025-05-11 00:02:31,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:02:42,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 648.33759 ± 335.068
2025-05-11 00:02:42,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [634.973, 864.2474, 847.74084, 423.43906, 74.234985, 578.11957, 1183.5306, 236.50829, 1080.9359, 559.64575]
2025-05-11 00:02:42,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [602.0, 1000.0, 591.0, 340.0, 48.0, 415.0, 1000.0, 192.0, 1000.0, 1000.0]
2025-05-11 00:02:42,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 20 minutes, 11 seconds)
2025-05-11 00:05:36,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:05:47,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 817.08679 ± 448.218
2025-05-11 00:05:47,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1397.9727, 790.10596, 917.8449, 207.99947, 1247.0447, 169.82922, 334.13068, 695.7713, 943.50653, 1466.6633]
2025-05-11 00:05:47,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 526.0, 1000.0, 150.0, 728.0, 110.0, 261.0, 646.0, 750.0, 1000.0]
2025-05-11 00:05:47,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 15 minutes, 6 seconds)
2025-05-11 00:09:00,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:09:15,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 945.33020 ± 364.944
2025-05-11 00:09:15,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1343.7162, 1445.8582, 923.7708, 427.84576, 689.546, 788.34247, 1327.9268, 352.33566, 951.3051, 1202.6554]
2025-05-11 00:09:15,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 990.0, 1000.0, 287.0, 1000.0, 551.0, 1000.0, 274.0, 1000.0, 1000.0]
2025-05-11 00:09:15,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (945.33) for latency ExtremeClogL1U23
2025-05-11 00:09:15,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:09:15,539 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 00:09:15,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 15 minutes, 19 seconds)
2025-05-11 00:12:04,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:12:18,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 903.89862 ± 322.819
2025-05-11 00:12:18,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1384.145, 876.0871, 880.2044, 594.3692, 258.05478, 630.2061, 1297.2151, 1108.8485, 1055.9291, 953.9274]
2025-05-11 00:12:18,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [981.0, 1000.0, 742.0, 446.0, 240.0, 488.0, 1000.0, 1000.0, 1000.0, 742.0]
2025-05-11 00:12:18,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 11 minutes, 54 seconds)
2025-05-11 00:15:17,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:15:31,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 794.20416 ± 373.682
2025-05-11 00:15:31,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1056.3396, 1088.1494, 1067.6733, 344.99762, 651.2734, 330.0536, 1209.7373, 154.89041, 867.885, 1171.0426]
2025-05-11 00:15:31,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [889.0, 1000.0, 928.0, 247.0, 1000.0, 271.0, 847.0, 129.0, 1000.0, 901.0]
2025-05-11 00:15:31,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 9 minutes, 13 seconds)
2025-05-11 00:18:35,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:18:49,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 595.05121 ± 209.585
2025-05-11 00:18:49,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [483.0255, 920.17346, 633.34375, 760.9802, 248.04556, 633.6787, 845.5153, 333.52618, 415.3784, 676.8451]
2025-05-11 00:18:49,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [381.0, 1000.0, 1000.0, 1000.0, 180.0, 1000.0, 1000.0, 267.0, 437.0, 1000.0]
2025-05-11 00:18:49,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 6 minutes, 53 seconds)
2025-05-11 00:21:48,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:22:00,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 943.94012 ± 534.891
2025-05-11 00:22:00,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1260.9832, 678.54486, 105.89506, 327.29794, 194.9261, 1559.128, 1457.1884, 1145.188, 1469.1858, 1241.0631]
2025-05-11 00:22:00,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [854.0, 505.0, 93.0, 246.0, 156.0, 1000.0, 1000.0, 760.0, 1000.0, 912.0]
2025-05-11 00:22:00,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 4 minutes, 54 seconds)
2025-05-11 00:24:54,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:25:09,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 971.92053 ± 378.059
2025-05-11 00:25:09,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [957.2251, 758.46545, 1381.4489, 851.24725, 1070.8291, 175.30194, 1364.3134, 548.5324, 1289.9962, 1321.8453]
2025-05-11 00:25:09,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 651.0, 731.0, 132.0, 1000.0, 498.0, 1000.0, 1000.0]
2025-05-11 00:25:09,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (971.92) for latency ExtremeClogL1U23
2025-05-11 00:25:09,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:25:09,211 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 00:25:09,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 58 minutes, 1 second)
2025-05-11 00:28:07,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:28:21,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 943.22577 ± 299.756
2025-05-11 00:28:21,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [604.7447, 1348.4441, 1083.4695, 1078.9652, 1402.9633, 621.37885, 728.5896, 1094.9423, 499.42673, 969.33435]
2025-05-11 00:28:21,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [498.0, 1000.0, 1000.0, 1000.0, 1000.0, 560.0, 489.0, 1000.0, 400.0, 627.0]
2025-05-11 00:28:21,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 56 minutes, 35 seconds)
2025-05-11 00:31:23,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:31:34,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 743.02020 ± 473.165
2025-05-11 00:31:34,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [959.1296, 826.00214, 1620.7655, 850.9172, 483.6375, 88.767265, 1186.2844, 224.79878, 148.31123, 1041.5884]
2025-05-11 00:31:34,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [537.0, 591.0, 1000.0, 1000.0, 438.0, 55.0, 1000.0, 178.0, 142.0, 988.0]
2025-05-11 00:31:34,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 53 minutes, 19 seconds)
2025-05-11 00:34:18,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:34:33,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 969.16858 ± 385.853
2025-05-11 00:34:33,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [816.475, 271.16312, 1392.7242, 1385.228, 1127.2654, 887.35516, 944.1201, 1106.7029, 351.75284, 1408.8989]
2025-05-11 00:34:33,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [549.0, 220.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 306.0, 1000.0]
2025-05-11 00:34:33,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 46 minutes, 50 seconds)
2025-05-11 00:37:30,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:37:42,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 817.95483 ± 521.771
2025-05-11 00:37:42,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [126.27618, 1404.7006, 461.4553, 165.29927, 1408.415, 1489.0079, 1200.0564, 697.35144, 1003.1088, 223.87791]
2025-05-11 00:37:42,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [85.0, 1000.0, 351.0, 126.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 169.0]
2025-05-11 00:37:43,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 43 minutes, 22 seconds)
2025-05-11 00:40:40,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:40:51,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 701.16223 ± 342.663
2025-05-11 00:40:51,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1209.9581, 490.0962, 467.05704, 109.69275, 1312.465, 510.13922, 753.4382, 832.5399, 534.5773, 791.6586]
2025-05-11 00:40:51,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [998.0, 310.0, 335.0, 81.0, 1000.0, 347.0, 1000.0, 648.0, 393.0, 1000.0]
2025-05-11 00:40:51,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 40 minutes, 15 seconds)
2025-05-11 00:43:43,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:43:58,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1172.33301 ± 400.080
2025-05-11 00:43:58,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [589.8922, 256.3865, 1462.0013, 1101.1284, 1246.8494, 1532.3994, 1352.7042, 1477.9243, 1414.3628, 1289.6815]
2025-05-11 00:43:58,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [382.0, 220.0, 1000.0, 801.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 811.0]
2025-05-11 00:43:58,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (1172.33) for latency ExtremeClogL1U23
2025-05-11 00:43:58,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:43:58,413 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 00:43:58,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 36 minutes, 9 seconds)
2025-05-11 00:47:05,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:47:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1006.14014 ± 446.449
2025-05-11 00:47:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [301.4063, 1473.1239, 797.6323, 1297.8809, 1251.5614, 1524.3654, 1164.9594, 365.628, 1376.8577, 507.98633]
2025-05-11 00:47:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 500.0, 1000.0, 1000.0, 1000.0, 724.0, 295.0, 1000.0, 402.0]
2025-05-11 00:47:18,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 34 minutes, 15 seconds)
2025-05-11 00:50:18,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:50:31,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 650.65961 ± 398.833
2025-05-11 00:50:31,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1451.608, 426.55148, 1219.0425, 326.55344, 656.2025, 674.14594, 605.0548, 40.148567, 749.3773, 357.91156]
2025-05-11 00:50:31,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [984.0, 256.0, 1000.0, 245.0, 1000.0, 1000.0, 1000.0, 46.0, 1000.0, 256.0]
2025-05-11 00:50:31,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 33 minutes, 13 seconds)
2025-05-11 00:53:17,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:53:26,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 507.70673 ± 276.076
2025-05-11 00:53:26,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [342.0861, 286.2115, 933.2182, 834.82495, 129.4028, 778.9748, 422.5502, 619.50946, 119.66154, 610.6276]
2025-05-11 00:53:26,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [314.0, 230.0, 1000.0, 589.0, 85.0, 1000.0, 265.0, 443.0, 72.0, 1000.0]
2025-05-11 00:53:26,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 27 minutes, 47 seconds)
2025-05-11 00:56:36,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:56:44,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 391.91803 ± 303.947
2025-05-11 00:56:44,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [72.96515, 1151.4237, 406.04236, 296.82056, 73.33769, 183.50165, 385.30612, 432.5028, 258.02817, 659.25226]
2025-05-11 00:56:44,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [67.0, 828.0, 366.0, 1000.0, 58.0, 140.0, 245.0, 348.0, 208.0, 1000.0]
2025-05-11 00:56:44,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 26 minutes, 2 seconds)
2025-05-11 00:59:26,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 00:59:38,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 647.90057 ± 373.671
2025-05-11 00:59:38,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [453.99222, 820.07874, 477.34467, 516.7543, 870.1382, 1569.9371, 179.60706, 722.675, 251.16177, 617.3171]
2025-05-11 00:59:38,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [302.0, 1000.0, 378.0, 468.0, 1000.0, 1000.0, 104.0, 501.0, 200.0, 1000.0]
2025-05-11 00:59:38,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 20 minutes, 56 seconds)
2025-05-11 01:02:43,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:02:48,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 435.64850 ± 423.041
2025-05-11 01:02:48,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1583.0968, 447.26562, 304.71716, 173.9729, 410.69467, 110.60565, 274.50848, 744.48444, 137.94263, 169.19676]
2025-05-11 01:02:48,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 228.0, 224.0, 102.0, 310.0, 64.0, 197.0, 498.0, 100.0, 115.0]
2025-05-11 01:02:48,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 16 minutes, 18 seconds)
2025-05-11 01:05:43,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:05:56,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 948.69202 ± 450.233
2025-05-11 01:05:56,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1581.987, 517.75824, 593.65125, 1653.6595, 559.29553, 1253.2727, 1072.1886, 247.56322, 1176.9784, 830.5657]
2025-05-11 01:05:56,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 409.0, 1000.0, 905.0, 413.0, 799.0, 672.0, 228.0, 1000.0, 1000.0]
2025-05-11 01:05:56,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 12 minutes, 32 seconds)
2025-05-11 01:08:53,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:09:04,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1007.44238 ± 489.047
2025-05-11 01:09:04,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1774.7953, 637.8632, 644.7838, 360.8437, 1157.6572, 1474.2479, 1260.7345, 1419.3534, 1128.9083, 215.23615]
2025-05-11 01:09:04,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 484.0, 1000.0, 263.0, 668.0, 1000.0, 838.0, 1000.0, 732.0, 107.0]
2025-05-11 01:09:04,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 11 minutes, 22 seconds)
2025-05-11 01:11:58,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:12:12,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 870.64325 ± 520.073
2025-05-11 01:12:12,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [522.1127, 19.475729, 600.07385, 888.4653, 1552.2625, 237.07362, 1399.1906, 1384.3118, 657.74664, 1445.7198]
2025-05-11 01:12:12,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 21.0, 1000.0, 654.0, 1000.0, 132.0, 1000.0, 1000.0, 1000.0, 990.0]
2025-05-11 01:12:12,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 6 minutes, 51 seconds)
2025-05-11 01:15:18,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:15:31,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1026.77625 ± 561.673
2025-05-11 01:15:31,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1351.8832, 1585.5311, 779.2941, 1406.1556, 52.813694, 1700.5824, 1192.033, 250.90344, 477.1547, 1471.4115]
2025-05-11 01:15:31,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 38.0, 1000.0, 883.0, 191.0, 279.0, 903.0]
2025-05-11 01:15:31,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 7 minutes, 4 seconds)
2025-05-11 01:18:18,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:18:27,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 804.97552 ± 530.882
2025-05-11 01:18:27,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [66.70044, 977.8425, 308.80325, 1790.091, 372.7732, 1079.329, 1480.3849, 266.9465, 938.1949, 768.6896]
2025-05-11 01:18:27,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [47.0, 622.0, 168.0, 1000.0, 235.0, 1000.0, 1000.0, 177.0, 605.0, 468.0]
2025-05-11 01:18:27,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 2 minutes, 10 seconds)
2025-05-11 01:21:23,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:21:40,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1349.72974 ± 320.207
2025-05-11 01:21:40,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1555.1338, 1555.3209, 1429.5536, 1572.374, 1675.8916, 847.5335, 833.7915, 1616.3953, 934.0605, 1477.242]
2025-05-11 01:21:40,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 867.0, 1000.0, 1000.0, 1000.0, 499.0, 1000.0, 1000.0, 921.0]
2025-05-11 01:21:40,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (1349.73) for latency ExtremeClogL1U23
2025-05-11 01:21:40,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:21:40,385 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 01:21:40,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 59 minutes, 37 seconds)
2025-05-11 01:24:49,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:25:02,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 944.41296 ± 497.679
2025-05-11 01:25:02,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [700.5855, 993.1991, 675.07544, 1956.2781, 811.9231, 680.40497, 1570.7296, 89.214836, 754.5906, 1212.1288]
2025-05-11 01:25:02,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 361.0, 1000.0, 1000.0, 396.0, 1000.0, 57.0, 483.0, 721.0]
2025-05-11 01:25:02,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 58 minutes, 4 seconds)
2025-05-11 01:27:52,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:28:05,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1027.83533 ± 409.417
2025-05-11 01:28:05,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [668.5269, 323.33426, 854.4149, 1698.0214, 885.01294, 946.43835, 1582.2501, 1069.3777, 1454.4092, 796.568]
2025-05-11 01:28:05,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [421.0, 202.0, 508.0, 1000.0, 500.0, 582.0, 791.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:28:05,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 54 minutes, 20 seconds)
2025-05-11 01:31:03,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:31:12,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 592.14844 ± 579.856
2025-05-11 01:31:12,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [2006.4498, 885.4313, 36.327553, 191.38481, 908.78766, 251.2989, 19.010202, 789.30963, 132.85487, 700.6294]
2025-05-11 01:31:12,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 39.0, 137.0, 598.0, 117.0, 29.0, 1000.0, 108.0, 454.0]
2025-05-11 01:31:12,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 49 minutes, 46 seconds)
2025-05-11 01:34:00,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:34:12,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 962.57434 ± 618.473
2025-05-11 01:34:12,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1224.6857, 842.6423, 538.5616, 57.992245, 1652.0596, 1622.9846, 402.92902, 167.6948, 1259.6959, 1856.4974]
2025-05-11 01:34:12,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 490.0, 314.0, 49.0, 1000.0, 903.0, 320.0, 116.0, 1000.0, 1000.0]
2025-05-11 01:34:12,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 47 minutes)
2025-05-11 01:37:12,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:37:27,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1362.09058 ± 467.588
2025-05-11 01:37:27,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1264.1597, 840.1279, 919.88196, 708.1663, 1885.9517, 944.523, 1889.6085, 1393.0714, 1870.2341, 1905.1821]
2025-05-11 01:37:27,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [704.0, 448.0, 1000.0, 450.0, 1000.0, 1000.0, 1000.0, 767.0, 1000.0, 1000.0]
2025-05-11 01:37:27,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (1362.09) for latency ExtremeClogL1U23
2025-05-11 01:37:27,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:37:27,777 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 01:37:27,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 44 minutes, 12 seconds)
2025-05-11 01:40:25,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:40:37,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1070.94434 ± 577.520
2025-05-11 01:40:37,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1533.8229, 1702.8341, 55.4665, 1363.3147, 1745.7292, 1146.9158, 675.6958, 552.75336, 375.94843, 1556.9617]
2025-05-11 01:40:37,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [926.0, 1000.0, 38.0, 1000.0, 1000.0, 810.0, 368.0, 382.0, 226.0, 929.0]
2025-05-11 01:40:37,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 39 minutes, 46 seconds)
2025-05-11 01:43:38,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:43:49,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 936.67773 ± 524.509
2025-05-11 01:43:49,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [459.00113, 1465.8807, 1627.0665, 1405.6528, 457.86038, 588.1057, 477.50482, 1288.358, 1440.1215, 157.22612]
2025-05-11 01:43:49,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [295.0, 1000.0, 1000.0, 1000.0, 381.0, 299.0, 374.0, 1000.0, 763.0, 99.0]
2025-05-11 01:43:49,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 37 minutes, 32 seconds)
2025-05-11 01:46:43,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:46:51,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 698.21826 ± 595.318
2025-05-11 01:46:51,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [54.28105, 864.03015, 1868.6827, 221.36067, 573.21185, 861.61035, 418.48404, 265.10727, 189.12224, 1666.2922]
2025-05-11 01:46:51,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 458.0, 1000.0, 130.0, 346.0, 1000.0, 288.0, 101.0, 166.0, 801.0]
2025-05-11 01:46:51,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 33 minutes, 56 seconds)
2025-05-11 01:49:47,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:50:01,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1134.66870 ± 547.353
2025-05-11 01:50:01,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [413.10022, 1558.2694, 1642.7236, 678.6595, 1641.2169, 174.0923, 1742.8295, 740.883, 1439.4166, 1315.4962]
2025-05-11 01:50:01,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [234.0, 1000.0, 998.0, 1000.0, 1000.0, 99.0, 1000.0, 389.0, 1000.0, 1000.0]
2025-05-11 01:50:01,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 31 minutes, 46 seconds)
2025-05-11 01:53:00,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:53:09,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 805.87225 ± 480.807
2025-05-11 01:53:09,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [149.66913, 1857.9807, 409.0205, 943.10333, 1109.9452, 741.30585, 1252.9652, 318.72348, 575.9714, 700.038]
2025-05-11 01:53:09,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [100.0, 885.0, 260.0, 447.0, 653.0, 417.0, 1000.0, 173.0, 1000.0, 360.0]
2025-05-11 01:53:09,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 27 minutes, 54 seconds)
2025-05-11 01:56:02,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:56:08,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 626.50018 ± 519.500
2025-05-11 01:56:08,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [266.85553, 282.64542, 86.41751, 1677.7836, 701.00665, 487.26898, 272.6077, 1275.0912, 116.35016, 1098.9749]
2025-05-11 01:56:08,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [178.0, 206.0, 82.0, 1000.0, 360.0, 254.0, 174.0, 652.0, 67.0, 630.0]
2025-05-11 01:56:08,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 23 minutes, 47 seconds)
2025-05-11 01:59:04,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 01:59:13,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 694.19543 ± 418.222
2025-05-11 01:59:13,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [275.18515, 712.8195, 402.8392, 585.6673, 1147.5565, 630.3439, 118.77358, 602.0537, 1649.2341, 817.48175]
2025-05-11 01:59:13,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [220.0, 1000.0, 223.0, 264.0, 677.0, 361.0, 76.0, 1000.0, 847.0, 379.0]
2025-05-11 01:59:13,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 20 minutes, 5 seconds)
2025-05-11 02:02:14,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:02:25,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 843.51794 ± 695.127
2025-05-11 02:02:25,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [134.61975, 541.5188, 250.81331, 1700.8632, 52.878334, 1243.749, 649.3204, 1543.8958, 253.32248, 2064.1995]
2025-05-11 02:02:25,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [84.0, 334.0, 133.0, 1000.0, 42.0, 1000.0, 1000.0, 927.0, 164.0, 1000.0]
2025-05-11 02:02:25,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 17 minutes, 47 seconds)
2025-05-11 02:05:29,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:05:44,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 985.21423 ± 619.529
2025-05-11 02:05:44,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1158.8397, 1385.104, 894.16956, 1508.2305, 773.9475, 1808.9166, 72.24883, 323.8748, 135.35774, 1791.453]
2025-05-11 02:05:44,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [651.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 75.0, 1000.0, 80.0, 1000.0]
2025-05-11 02:05:44,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 15 minutes, 26 seconds)
2025-05-11 02:08:33,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:08:47,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1408.09827 ± 562.570
2025-05-11 02:08:47,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [745.60095, 1569.9215, 1808.9905, 1689.7725, 589.9469, 1933.502, 1701.9333, 385.42868, 1949.5044, 1706.3824]
2025-05-11 02:08:47,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [504.0, 1000.0, 943.0, 992.0, 305.0, 1000.0, 1000.0, 263.0, 1000.0, 1000.0]
2025-05-11 02:08:47,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (1408.10) for latency ExtremeClogL1U23
2025-05-11 02:08:47,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 02:08:47,719 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc4/noisy-ant/ExtremeClogL1U23-sac-aug-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 02:08:47,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 11 minutes, 55 seconds)
2025-05-11 02:11:49,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:11:58,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 884.84314 ± 709.076
2025-05-11 02:11:58,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [646.82654, 1149.1957, 337.37286, 1481.5992, 97.54293, 92.35504, 18.48434, 1176.4491, 1947.4716, 1901.1342]
2025-05-11 02:11:58,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [493.0, 715.0, 163.0, 1000.0, 55.0, 90.0, 21.0, 584.0, 1000.0, 1000.0]
2025-05-11 02:11:58,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 9 minutes, 37 seconds)
2025-05-11 02:14:54,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:15:09,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1125.19080 ± 428.923
2025-05-11 02:15:09,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [999.551, 239.34363, 1492.6724, 785.39624, 1305.4034, 854.02014, 909.621, 1339.805, 1586.8478, 1739.2476]
2025-05-11 02:15:09,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 165.0, 1000.0, 1000.0, 1000.0, 1000.0, 599.0, 744.0, 1000.0, 1000.0]
2025-05-11 02:15:09,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 6 minutes, 56 seconds)
2025-05-11 02:17:59,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:18:09,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 887.03308 ± 622.088
2025-05-11 02:18:09,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1916.0629, 2083.533, 1158.6853, 657.22736, 749.64233, 532.32684, 791.93665, 64.45075, 623.60443, 292.86108]
2025-05-11 02:18:09,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 321.0, 1000.0, 373.0, 361.0, 70.0, 305.0, 166.0]
2025-05-11 02:18:09,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 2 minutes, 58 seconds)
2025-05-11 02:21:07,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:21:16,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 942.72491 ± 579.689
2025-05-11 02:21:16,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [753.42645, 1865.646, 483.0354, 2000.6653, 202.822, 1268.4802, 1052.6432, 364.81534, 573.8842, 861.8315]
2025-05-11 02:21:16,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [289.0, 976.0, 285.0, 1000.0, 150.0, 783.0, 448.0, 288.0, 328.0, 553.0]
2025-05-11 02:21:16,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 58 minutes, 59 seconds)
2025-05-11 02:24:24,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:24:31,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 638.54285 ± 379.951
2025-05-11 02:24:31,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1065.1904, 256.82983, 391.0574, 1170.9967, 985.4293, 1100.1072, 442.90378, 70.66775, 509.23727, 393.009]
2025-05-11 02:24:31,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 141.0, 227.0, 516.0, 506.0, 824.0, 191.0, 72.0, 314.0, 223.0]
2025-05-11 02:24:31,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 56 minutes, 37 seconds)
2025-05-11 02:27:16,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:27:21,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 464.31894 ± 381.833
2025-05-11 02:27:21,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [682.36426, 1003.5033, 79.03749, 1160.3718, 589.36786, 398.71027, 71.81956, 511.9792, 119.20836, 26.827023]
2025-05-11 02:27:21,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [328.0, 1000.0, 54.0, 650.0, 265.0, 205.0, 56.0, 221.0, 68.0, 24.0]
2025-05-11 02:27:21,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 52 minutes, 20 seconds)
2025-05-11 02:30:17,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:30:25,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 912.28381 ± 549.075
2025-05-11 02:30:25,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1605.3372, 732.0002, 102.40174, 1293.0527, 1467.9553, 589.2654, 1589.3793, 255.40901, 330.8402, 1157.1964]
2025-05-11 02:30:25,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [823.0, 404.0, 71.0, 631.0, 1000.0, 293.0, 742.0, 144.0, 163.0, 561.0]
2025-05-11 02:30:25,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 50 seconds)
2025-05-11 02:33:28,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:33:34,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 471.08432 ± 277.796
2025-05-11 02:33:34,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [216.76231, 243.33943, 1030.2993, 427.47998, 273.11148, 340.17606, 932.64636, 566.4374, 458.3213, 222.26958]
2025-05-11 02:33:34,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [128.0, 172.0, 1000.0, 340.0, 139.0, 231.0, 1000.0, 254.0, 229.0, 171.0]
2025-05-11 02:33:34,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 46 minutes, 14 seconds)
2025-05-11 02:36:36,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:36:41,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 462.01889 ± 251.168
2025-05-11 02:36:41,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [133.10406, 136.48338, 483.1706, 613.2495, 384.32974, 337.60034, 817.41534, 912.39746, 268.32465, 534.1142]
2025-05-11 02:36:41,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 65.0, 418.0, 314.0, 194.0, 126.0, 1000.0, 424.0, 203.0, 268.0]
2025-05-11 02:36:41,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 43 minutes, 12 seconds)
2025-05-11 02:39:30,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:39:39,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 886.07092 ± 649.727
2025-05-11 02:39:39,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [24.843222, 1719.366, 269.6514, 659.95935, 219.0885, 357.0691, 1519.5544, 1706.478, 763.6464, 1621.0529]
2025-05-11 02:39:39,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [16.0, 890.0, 1000.0, 317.0, 96.0, 182.0, 688.0, 770.0, 399.0, 742.0]
2025-05-11 02:39:39,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 39 minutes, 20 seconds)
2025-05-11 02:42:30,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:42:39,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 634.56519 ± 481.097
2025-05-11 02:42:39,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [374.30637, 1467.7096, 326.98737, 67.492, 704.2946, 98.01347, 850.386, 1482.8821, 616.904, 356.6767]
2025-05-11 02:42:39,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [200.0, 1000.0, 181.0, 44.0, 352.0, 69.0, 1000.0, 836.0, 1000.0, 194.0]
2025-05-11 02:42:39,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 36 minutes, 42 seconds)
2025-05-11 02:45:31,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:45:45,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 931.27667 ± 433.163
2025-05-11 02:45:45,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [618.3945, 572.1676, 1067.556, 701.0822, 944.0625, 865.1464, 1400.7865, 129.98178, 1351.2341, 1662.3549]
2025-05-11 02:45:45,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 313.0, 1000.0, 1000.0, 472.0, 372.0, 1000.0, 90.0, 1000.0, 1000.0]
2025-05-11 02:45:45,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 33 minutes, 43 seconds)
2025-05-11 02:48:44,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:48:49,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 565.34875 ± 688.010
2025-05-11 02:48:49,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [470.0338, 258.46567, 218.24994, 49.97945, 1187.5958, 2405.5671, 120.07163, 539.4625, 140.01134, 264.04987]
2025-05-11 02:48:49,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [218.0, 166.0, 130.0, 35.0, 629.0, 1000.0, 81.0, 256.0, 70.0, 146.0]
2025-05-11 02:48:49,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 29 seconds)
2025-05-11 02:51:49,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:51:56,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 643.55121 ± 460.298
2025-05-11 02:51:56,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [110.00554, 1201.6354, 109.27582, 235.47383, 1328.5498, 449.675, 1151.0228, 273.88113, 1040.8933, 535.0991]
2025-05-11 02:51:56,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [59.0, 664.0, 109.0, 131.0, 1000.0, 281.0, 716.0, 208.0, 1000.0, 197.0]
2025-05-11 02:51:56,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 26 seconds)
2025-05-11 02:54:46,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:54:54,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 754.03333 ± 532.106
2025-05-11 02:54:54,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [637.11383, 16.996696, 1403.4642, 1123.2728, 698.22986, 1351.5913, 1003.56415, 1247.451, 24.067526, 34.582077]
2025-05-11 02:54:54,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 19.0, 606.0, 656.0, 325.0, 1000.0, 522.0, 596.0, 54.0, 44.0]
2025-05-11 02:54:54,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 23 seconds)
2025-05-11 02:58:02,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 02:58:10,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 535.30322 ± 380.308
2025-05-11 02:58:10,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [721.9342, 378.27658, 94.3418, 1198.7377, 706.9406, 309.3014, 83.835976, 169.62126, 1111.7893, 578.2536]
2025-05-11 02:58:10,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 187.0, 80.0, 718.0, 1000.0, 258.0, 59.0, 85.0, 1000.0, 330.0]
2025-05-11 02:58:10,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 21 minutes, 42 seconds)
2025-05-11 03:01:02,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:01:11,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 907.32208 ± 625.683
2025-05-11 03:01:11,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [214.43642, 1377.5591, 1224.3021, 745.07056, 17.732025, 1986.7329, 1346.7635, 97.37654, 1385.3575, 677.89]
2025-05-11 03:01:11,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [140.0, 665.0, 512.0, 373.0, 18.0, 1000.0, 1000.0, 84.0, 1000.0, 403.0]
2025-05-11 03:01:11,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 31 seconds)
2025-05-11 03:04:18,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:04:28,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 940.96936 ± 540.479
2025-05-11 03:04:28,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [513.3015, 105.8292, 1748.0087, 675.3037, 230.11925, 1666.1641, 1105.2461, 809.4555, 1367.7635, 1188.5021]
2025-05-11 03:04:28,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [193.0, 103.0, 1000.0, 361.0, 116.0, 962.0, 1000.0, 403.0, 703.0, 515.0]
2025-05-11 03:04:28,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 39 seconds)
2025-05-11 03:07:23,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:07:29,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 708.25305 ± 718.837
2025-05-11 03:07:29,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [2141.6118, 191.61159, 1917.4216, 221.20432, 1060.2395, 22.730753, 472.0223, 263.492, 624.3398, 167.85674]
2025-05-11 03:07:29,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [906.0, 109.0, 970.0, 130.0, 459.0, 19.0, 269.0, 131.0, 348.0, 82.0]
2025-05-11 03:07:29,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 25 seconds)
2025-05-11 03:10:15,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:10:24,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 814.63416 ± 524.384
2025-05-11 03:10:24,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [497.82513, 415.49744, 244.19235, 987.38153, 316.8683, 455.25436, 1084.6287, 1013.1625, 2074.3074, 1057.2245]
2025-05-11 03:10:24,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [282.0, 201.0, 152.0, 1000.0, 219.0, 252.0, 640.0, 715.0, 1000.0, 1000.0]
2025-05-11 03:10:24,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 18 seconds)
2025-05-11 03:13:19,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:13:30,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 808.99329 ± 604.718
2025-05-11 03:13:30,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [62.27223, 699.97363, 1815.4103, 915.1566, 879.2658, 666.5011, 203.36981, 933.25867, 59.66434, 1855.0598]
2025-05-11 03:13:30,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [40.0, 1000.0, 1000.0, 1000.0, 326.0, 1000.0, 113.0, 515.0, 43.0, 1000.0]
2025-05-11 03:13:30,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 8 seconds)
2025-05-11 03:16:26,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:16:36,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 991.27283 ± 736.080
2025-05-11 03:16:36,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [335.1786, 1319.3796, 59.419704, 204.9208, 1796.2155, 218.64357, 1653.471, 588.33234, 1932.6686, 1804.4993]
2025-05-11 03:16:36,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [202.0, 675.0, 34.0, 117.0, 1000.0, 128.0, 1000.0, 277.0, 1000.0, 1000.0]
2025-05-11 03:16:36,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 4 seconds)
2025-05-11 03:19:33,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 03:19:45,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1088.41333 ± 673.787
2025-05-11 03:19:45,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1730.8895, 1440.5941, 1758.3622, 390.3986, 2268.7288, 390.47607, 1144.9326, 529.7939, 154.95993, 1074.9972]
2025-05-11 03:19:45,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 172.0, 1000.0, 261.0, 1000.0, 251.0, 77.0, 550.0]
2025-05-11 03:19:45,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1251 [DEBUG]: Training session finished
