2025-05-10 22:03:26,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4
2025-05-10 22:03:26,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4
2025-05-10 22:03:26,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7c040b63df70>}
2025-05-10 22:03:26,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1111 [DEBUG]: using device: cpu
2025-05-10 22:03:26,066 baseline-sac-noisy-ant:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 24
2025-05-10 22:03:26,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-10 22:03:26,076 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-10 22:03:26,076 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=67, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 22:03:26,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-10 22:03:26,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-10 22:06:06,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:06:26,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -2257.77808 ± 84.895
2025-05-10 22:06:26,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-2376.017, -2217.0833, -2299.9465, -2209.1406, -2317.4521, -2292.033, -2085.5452, -2239.4036, -2363.39, -2177.7695]
2025-05-10 22:06:26,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:06:26,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-2257.78) for latency MM1Queue_a033_s075
2025-05-10 22:06:26,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:06:26,444 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:06:26,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 57 minutes, 24 seconds)
2025-05-10 22:09:21,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:09:24,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -101.42181 ± 231.313
2025-05-10 22:09:24,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [7.228071, -44.90333, -30.853754, -792.18396, 10.017963, -21.854692, -15.022126, -55.253525, -14.032875, -57.359818]
2025-05-10 22:09:24,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [20.0, 76.0, 53.0, 1000.0, 15.0, 28.0, 67.0, 93.0, 17.0, 124.0]
2025-05-10 22:09:24,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-101.42) for latency MM1Queue_a033_s075
2025-05-10 22:09:24,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:09:24,264 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:09:24,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 52 minutes, 25 seconds)
2025-05-10 22:12:19,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:12:24,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -139.72536 ± 218.525
2025-05-10 22:12:24,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-21.852411, -9.417202, -591.1788, 2.9608457, -188.74382, -15.865414, -7.369622, -3.2853272, -27.724157, -534.77765]
2025-05-10 22:12:24,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 43.0, 1000.0, 68.0, 287.0, 44.0, 72.0, 19.0, 55.0, 1000.0]
2025-05-10 22:12:24,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 50 minutes, 4 seconds)
2025-05-10 22:15:41,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:15:50,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -44.17646 ± 76.938
2025-05-10 22:15:50,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [35.296326, -4.8860817, 10.0843525, -9.0129175, -125.65537, -158.18088, -182.59761, 7.813722, 31.011972, -45.638157]
2025-05-10 22:15:50,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 56.0, 141.0, 40.0, 1000.0, 1000.0, 1000.0, 45.0, 69.0, 143.0]
2025-05-10 22:15:50,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-44.18) for latency MM1Queue_a033_s075
2025-05-10 22:15:50,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:15:50,585 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:15:50,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 57 minutes, 45 seconds)
2025-05-10 22:19:53,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:20:10,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 188.96368 ± 121.667
2025-05-10 22:20:10,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [19.395325, 301.28622, 289.12616, 332.84818, 136.37534, 300.71417, 276.8093, 178.49277, 44.18287, 10.406543]
2025-05-10 22:20:10,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 588.0, 90.0, 17.0]
2025-05-10 22:20:10,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (188.96) for latency MM1Queue_a033_s075
2025-05-10 22:20:10,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:20:10,355 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:20:10,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 17 minutes, 59 seconds)
2025-05-10 22:23:09,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:23:21,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 317.04584 ± 148.622
2025-05-10 22:23:21,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [204.73074, 605.28485, 250.33965, 54.06582, 375.13312, 361.2735, 422.64194, 432.77524, 171.7914, 292.42194]
2025-05-10 22:23:21,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [296.0, 1000.0, 410.0, 87.0, 922.0, 709.0, 840.0, 1000.0, 328.0, 1000.0]
2025-05-10 22:23:21,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (317.05) for latency MM1Queue_a033_s075
2025-05-10 22:23:21,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:23:21,993 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:23:22,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 18 minutes, 12 seconds)
2025-05-10 22:26:32,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:26:43,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 186.19058 ± 110.146
2025-05-10 22:26:43,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [280.28622, 208.55032, 299.14432, 124.38086, 243.0006, 274.96722, 67.13225, 27.899778, 317.27936, 19.2649]
2025-05-10 22:26:43,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 360.0, 1000.0, 281.0, 1000.0, 1000.0, 68.0, 87.0, 1000.0, 22.0]
2025-05-10 22:26:43,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 22 minutes, 2 seconds)
2025-05-10 22:29:35,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:29:45,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 192.46077 ± 154.416
2025-05-10 22:29:45,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [82.694496, 118.726494, 409.73053, 139.21443, 59.93025, 369.19394, 77.80345, 282.1217, -32.050903, 417.24326]
2025-05-10 22:29:45,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [189.0, 449.0, 1000.0, 572.0, 198.0, 1000.0, 137.0, 1000.0, 133.0, 1000.0]
2025-05-10 22:29:45,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 19 minutes, 16 seconds)
2025-05-10 22:32:45,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:32:52,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 168.52750 ± 152.720
2025-05-10 22:32:52,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [12.50321, 59.77129, 441.42322, 36.32573, 96.555305, 175.55547, 369.19678, 20.01231, 355.68802, 118.24361]
2025-05-10 22:32:52,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [17.0, 176.0, 1000.0, 67.0, 130.0, 206.0, 1000.0, 115.0, 1000.0, 274.0]
2025-05-10 22:32:52,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 9 minutes, 55 seconds)
2025-05-10 22:35:53,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:36:07,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 299.79327 ± 158.938
2025-05-10 22:36:07,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [444.82974, 371.9926, 387.5068, 62.779858, 453.0337, 50.191685, 393.6472, 373.7705, 67.741806, 392.43884]
2025-05-10 22:36:07,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 170.0, 1000.0, 133.0, 1000.0, 1000.0, 138.0, 1000.0]
2025-05-10 22:36:07,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 47 minutes, 12 seconds)
2025-05-10 22:38:55,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:39:08,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 210.42542 ± 143.865
2025-05-10 22:39:08,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [191.71747, 284.88336, 56.268932, 121.21754, 24.35902, 471.3337, 158.65515, 442.51407, 114.06585, 239.23912]
2025-05-10 22:39:08,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 72.0, 193.0, 45.0, 1000.0, 365.0, 1000.0, 1000.0, 585.0]
2025-05-10 22:39:08,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 40 minutes, 44 seconds)
2025-05-10 22:42:22,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:42:28,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 144.92416 ± 156.259
2025-05-10 22:42:28,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [64.59349, 33.6529, 155.741, 377.24014, 66.363174, 164.25337, 51.301533, 9.446932, 497.3583, 29.290848]
2025-05-10 22:42:28,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [103.0, 58.0, 515.0, 1000.0, 160.0, 264.0, 67.0, 68.0, 950.0, 41.0]
2025-05-10 22:42:28,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 37 minutes, 24 seconds)
2025-05-10 22:45:22,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:45:31,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 303.45270 ± 245.726
2025-05-10 22:45:31,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [188.43608, 38.175426, 599.5674, 183.11362, 416.95236, 42.012455, 687.0552, 636.21375, 201.24544, 41.755203]
2025-05-10 22:45:31,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [279.0, 44.0, 1000.0, 290.0, 1000.0, 45.0, 1000.0, 1000.0, 268.0, 47.0]
2025-05-10 22:45:31,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 34 minutes, 22 seconds)
2025-05-10 22:48:40,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:48:46,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 199.60104 ± 191.139
2025-05-10 22:48:46,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [170.14948, 114.82802, 135.8297, 530.38135, 93.9096, 597.2557, 223.43457, 62.07448, 44.711403, 23.43616]
2025-05-10 22:48:46,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [249.0, 178.0, 239.0, 1000.0, 141.0, 1000.0, 425.0, 163.0, 61.0, 62.0]
2025-05-10 22:48:46,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 33 minutes, 35 seconds)
2025-05-10 22:51:46,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:51:55,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 316.81021 ± 283.140
2025-05-10 22:51:55,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [646.73114, 697.2557, 111.567856, 13.130561, 85.5835, 513.04376, 222.21902, 743.19037, 8.922303, 126.45799]
2025-05-10 22:51:55,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 134.0, 29.0, 105.0, 1000.0, 476.0, 1000.0, 31.0, 102.0]
2025-05-10 22:51:55,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 28 minutes, 38 seconds)
2025-05-10 22:54:44,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:55:00,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 365.64313 ± 219.833
2025-05-10 22:55:00,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [576.8845, 621.71277, 187.36893, 92.04396, 588.09564, 163.57408, 24.3958, 605.3951, 394.61285, 402.34738]
2025-05-10 22:55:00,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 326.0, 89.0, 1000.0, 1000.0, 63.0, 1000.0, 1000.0, 1000.0]
2025-05-10 22:55:00,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (365.64) for latency MM1Queue_a033_s075
2025-05-10 22:55:00,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:55:00,396 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:55:00,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 26 minutes, 35 seconds)
2025-05-10 22:58:08,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 22:58:20,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 405.34433 ± 292.910
2025-05-10 22:58:20,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [166.66476, 76.057106, 31.966722, 688.52203, 344.75217, 660.4883, 697.0232, 684.75745, 694.5863, 8.625409]
2025-05-10 22:58:20,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [366.0, 140.0, 65.0, 1000.0, 511.0, 1000.0, 1000.0, 1000.0, 1000.0, 33.0]
2025-05-10 22:58:20,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (405.34) for latency MM1Queue_a033_s075
2025-05-10 22:58:20,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 22:58:20,787 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 22:58:20,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 23 minutes, 22 seconds)
2025-05-10 23:01:15,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:01:24,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 292.22086 ± 284.585
2025-05-10 23:01:24,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [68.70272, 83.575615, 30.860712, 703.7527, 63.29773, 46.333958, 657.7745, 682.67255, 492.3429, 92.89492]
2025-05-10 23:01:24,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 128.0, 48.0, 1000.0, 124.0, 51.0, 1000.0, 1000.0, 1000.0, 148.0]
2025-05-10 23:01:24,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 20 minutes, 33 seconds)
2025-05-10 23:04:20,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:04:36,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 495.44391 ± 284.510
2025-05-10 23:04:36,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [736.48035, 125.21853, 2.9351153, 669.5016, 539.7576, 667.1545, 731.77844, 551.8846, 817.7618, 111.96607]
2025-05-10 23:04:36,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 151.0, 41.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 163.0]
2025-05-10 23:04:36,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (495.44) for latency MM1Queue_a033_s075
2025-05-10 23:04:36,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:04:36,150 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:04:36,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 16 minutes, 21 seconds)
2025-05-10 23:07:50,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:08:04,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 434.55304 ± 264.988
2025-05-10 23:08:04,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [89.17898, 707.4095, 654.54425, 687.57117, 22.18567, 109.859024, 288.44458, 715.1636, 530.0919, 541.0814]
2025-05-10 23:08:04,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 1000.0, 1000.0, 1000.0, 26.0, 169.0, 341.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:08:04,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 18 minutes, 12 seconds)
2025-05-10 23:11:02,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:11:14,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 396.95490 ± 321.057
2025-05-10 23:11:14,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [599.0207, 773.8504, 102.33609, 181.7174, 718.5667, 50.791367, 22.430458, 722.2384, 749.5961, 49.00104]
2025-05-10 23:11:14,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 186.0, 202.0, 1000.0, 44.0, 21.0, 1000.0, 1000.0, 75.0]
2025-05-10 23:11:14,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 16 minutes, 22 seconds)
2025-05-10 23:14:02,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:14:20,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 600.56433 ± 250.792
2025-05-10 23:14:20,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [44.742516, 733.38116, 747.3963, 762.6159, 198.34995, 748.9878, 526.33167, 754.8383, 747.39856, 741.6012]
2025-05-10 23:14:20,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 1000.0, 1000.0, 1000.0, 211.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:14:20,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (600.56) for latency MM1Queue_a033_s075
2025-05-10 23:14:20,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:14:20,289 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:14:20,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 9 minutes, 28 seconds)
2025-05-10 23:17:32,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:17:49,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 636.16821 ± 294.256
2025-05-10 23:17:49,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [793.8598, 765.99817, 741.6423, 807.4965, 800.3204, 850.0972, 69.24128, 745.526, 754.34576, 33.15474]
2025-05-10 23:17:49,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 81.0, 1000.0, 1000.0, 26.0]
2025-05-10 23:17:49,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (636.17) for latency MM1Queue_a033_s075
2025-05-10 23:17:49,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:17:49,015 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:17:49,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 12 minutes, 34 seconds)
2025-05-10 23:20:38,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:20:50,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 450.08759 ± 301.983
2025-05-10 23:20:50,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [748.998, 707.077, 125.96944, 717.40375, 519.82495, 82.24877, 732.93207, 98.817726, 721.2078, 46.396427]
2025-05-10 23:20:50,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 129.0, 1000.0, 768.0, 85.0, 1000.0, 112.0, 1000.0, 70.0]
2025-05-10 23:20:50,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 6 minutes, 55 seconds)
2025-05-10 23:24:02,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:24:10,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 259.04208 ± 313.112
2025-05-10 23:24:10,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [152.20927, 42.70136, 82.83737, 764.0454, 647.2053, 782.65717, 49.668243, 17.462463, 37.465794, 14.168403]
2025-05-10 23:24:10,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [247.0, 45.0, 93.0, 1000.0, 1000.0, 1000.0, 58.0, 29.0, 59.0, 21.0]
2025-05-10 23:24:10,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 1 minute, 31 seconds)
2025-05-10 23:27:08,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:27:18,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 395.06326 ± 338.827
2025-05-10 23:27:18,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [837.29156, 766.20416, 400.28055, 39.959244, 798.7914, 111.90021, 106.94961, 55.123413, 66.4333, 767.699]
2025-05-10 23:27:18,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 386.0, 45.0, 1000.0, 123.0, 111.0, 79.0, 56.0, 1000.0]
2025-05-10 23:27:18,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 57 minutes, 56 seconds)
2025-05-10 23:30:19,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:30:37,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 637.92151 ± 228.902
2025-05-10 23:30:37,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [236.87822, 858.98346, 701.5448, 752.3233, 631.2579, 805.00995, 156.48027, 748.1113, 708.3186, 780.30664]
2025-05-10 23:30:37,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [334.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 183.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:30:37,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (637.92) for latency MM1Queue_a033_s075
2025-05-10 23:30:37,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:30:37,556 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:30:37,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 57 minutes, 48 seconds)
2025-05-10 23:33:36,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:33:47,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 438.55078 ± 340.363
2025-05-10 23:33:47,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [250.51869, 775.9763, 51.94069, 700.36536, 838.5962, 787.15265, 59.522877, 77.42728, 79.173225, 764.8345]
2025-05-10 23:33:47,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [215.0, 1000.0, 49.0, 1000.0, 1000.0, 1000.0, 55.0, 88.0, 100.0, 1000.0]
2025-05-10 23:33:47,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 50 minutes, 5 seconds)
2025-05-10 23:36:49,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:37:03,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 481.55768 ± 300.622
2025-05-10 23:37:03,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [679.5976, 738.3438, 689.29816, 129.40407, 763.4833, 90.19287, 66.331154, 181.882, 785.8499, 691.19385]
2025-05-10 23:37:03,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 157.0, 1000.0, 100.0, 80.0, 350.0, 1000.0, 1000.0]
2025-05-10 23:37:03,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 50 minutes, 10 seconds)
2025-05-10 23:39:54,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:40:10,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 579.87421 ± 320.463
2025-05-10 23:40:10,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [41.510815, 775.0093, 10.132135, 780.6985, 760.7409, 804.99567, 784.1964, 796.86224, 800.9188, 243.67722]
2025-05-10 23:40:10,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [43.0, 1000.0, 25.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 243.0]
2025-05-10 23:40:10,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 43 minutes, 58 seconds)
2025-05-10 23:43:13,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:43:32,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 675.05109 ± 206.606
2025-05-10 23:43:32,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [762.01825, 60.64259, 683.35583, 765.37805, 743.3515, 748.41656, 711.49146, 745.95154, 742.5189, 787.38605]
2025-05-10 23:43:32,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 143.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:43:32,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (675.05) for latency MM1Queue_a033_s075
2025-05-10 23:43:32,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:43:32,647 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:43:32,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 44 minutes, 1 second)
2025-05-10 23:46:35,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:46:54,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 732.28693 ± 121.394
2025-05-10 23:46:54,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [775.6578, 798.82104, 767.7341, 778.86163, 762.08154, 772.0653, 369.3995, 763.5952, 762.9883, 771.66486]
2025-05-10 23:46:54,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 410.0, 1000.0, 1000.0, 1000.0]
2025-05-10 23:46:54,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (732.29) for latency MM1Queue_a033_s075
2025-05-10 23:46:54,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:46:54,098 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:46:54,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 41 minutes, 20 seconds)
2025-05-10 23:50:01,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:50:20,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 720.31989 ± 191.272
2025-05-10 23:50:20,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [819.90125, 851.12787, 796.62195, 837.90094, 621.6307, 820.5764, 822.9923, 805.0414, 197.32481, 630.081]
2025-05-10 23:50:20,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 205.0, 1000.0]
2025-05-10 23:50:20,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 41 minutes, 49 seconds)
2025-05-10 23:53:19,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:53:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 760.18207 ± 218.322
2025-05-10 23:53:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [848.2961, 846.893, 821.61035, 848.39325, 842.09045, 837.2786, 816.45795, 817.32886, 106.36149, 817.11066]
2025-05-10 23:53:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 120.0, 1000.0]
2025-05-10 23:53:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (760.18) for latency MM1Queue_a033_s075
2025-05-10 23:53:38,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 23:53:38,558 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 23:53:38,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 38 minutes, 55 seconds)
2025-05-10 23:56:36,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 23:56:51,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 626.24121 ± 315.348
2025-05-10 23:56:51,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [795.8324, 817.46216, 822.17474, 851.7034, 914.45917, 137.54942, 194.47375, 805.2656, 111.451836, 812.0393]
2025-05-10 23:56:51,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 127.0, 191.0, 1000.0, 123.0, 1000.0]
2025-05-10 23:56:51,929 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 37 minutes, 3 seconds)
2025-05-10 23:59:49,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:00:06,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 576.54590 ± 262.640
2025-05-11 00:00:06,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [649.7138, 662.53796, 810.8195, 829.8667, 648.9958, 814.3685, 659.55914, 522.07666, 101.849434, 65.671455]
2025-05-11 00:00:06,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 118.0, 48.0]
2025-05-11 00:00:06,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 32 minutes, 4 seconds)
2025-05-11 00:03:20,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:03:34,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 525.00269 ± 320.629
2025-05-11 00:03:34,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [821.79517, 183.34686, 83.75544, 818.64, 93.38079, 795.8551, 822.55707, 619.95404, 808.94226, 201.79988]
2025-05-11 00:03:34,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 191.0, 86.0, 1000.0, 110.0, 1000.0, 1000.0, 1000.0, 1000.0, 203.0]
2025-05-11 00:03:34,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 30 minutes, 8 seconds)
2025-05-11 00:06:29,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:06:47,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 710.62305 ± 272.188
2025-05-11 00:06:47,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [843.3596, 55.967434, 301.6091, 835.2304, 867.3627, 821.59644, 811.6788, 850.1644, 838.7003, 880.5617]
2025-05-11 00:06:47,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 44.0, 310.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:06:47,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 23 minutes, 50 seconds)
2025-05-11 00:09:37,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:09:52,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 643.85864 ± 333.438
2025-05-11 00:09:52,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [845.5313, 898.64, 884.86676, 56.732685, 908.5684, 183.03833, 817.0994, 820.61237, 176.46414, 847.0333]
2025-05-11 00:09:52,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 47.0, 1000.0, 213.0, 1000.0, 1000.0, 191.0, 1000.0]
2025-05-11 00:09:52,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 18 minutes, 5 seconds)
2025-05-11 00:13:02,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:13:19,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 671.99963 ± 304.203
2025-05-11 00:13:19,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [814.7611, 823.75385, 815.21497, 84.15635, 44.05563, 815.87616, 833.5718, 817.2454, 828.1046, 843.257]
2025-05-11 00:13:19,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 85.0, 37.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:13:19,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 17 minutes, 31 seconds)
2025-05-11 00:16:18,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:16:34,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 558.13409 ± 272.095
2025-05-11 00:16:34,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [111.53108, 251.6556, 638.68604, 655.5846, 651.6691, 850.6758, 792.4868, 842.4317, 122.591576, 664.02856]
2025-05-11 00:16:34,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [144.0, 187.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 172.0, 1000.0]
2025-05-11 00:16:34,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 14 minutes, 10 seconds)
2025-05-11 00:19:38,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:19:54,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 593.25427 ± 361.823
2025-05-11 00:19:54,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [635.83246, 838.0221, 61.73033, 860.9572, 880.6111, 50.94286, 867.2839, 833.7966, 37.231052, 866.1351]
2025-05-11 00:19:54,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 66.0, 1000.0, 1000.0, 53.0, 1000.0, 1000.0, 41.0, 1000.0]
2025-05-11 00:19:54,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 9 minutes, 19 seconds)
2025-05-11 00:22:47,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:23:05,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 799.66272 ± 176.667
2025-05-11 00:23:05,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [841.8851, 930.67267, 804.70294, 882.4833, 906.62634, 875.0536, 869.54553, 922.26154, 328.3395, 635.0568]
2025-05-11 00:23:05,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 363.0, 639.0]
2025-05-11 00:23:05,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (799.66) for latency MM1Queue_a033_s075
2025-05-11 00:23:05,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 00:23:05,775 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 00:23:05,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 5 minutes, 54 seconds)
2025-05-11 00:26:16,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:26:36,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 767.92419 ± 208.695
2025-05-11 00:26:36,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [853.9767, 803.2478, 831.5408, 835.7479, 834.0037, 824.1462, 143.7383, 835.68677, 852.6179, 864.53546]
2025-05-11 00:26:36,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 168.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:26:36,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 7 minutes, 18 seconds)
2025-05-11 00:29:26,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:29:42,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 669.60358 ± 264.701
2025-05-11 00:29:42,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [394.19748, 839.06177, 117.764626, 757.2128, 381.28247, 651.3686, 892.9791, 878.4275, 828.117, 955.624]
2025-05-11 00:29:42,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [371.0, 1000.0, 143.0, 1000.0, 368.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:29:42,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 14 seconds)
2025-05-11 00:32:36,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:32:51,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 636.96558 ± 327.697
2025-05-11 00:32:51,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [341.01315, 865.9454, 48.59191, 834.30444, 692.0627, 861.7853, 78.23085, 849.624, 885.98627, 912.1117]
2025-05-11 00:32:51,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [314.0, 1000.0, 43.0, 1000.0, 1000.0, 1000.0, 85.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:32:51,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 55 minutes, 58 seconds)
2025-05-11 00:35:54,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:36:12,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 761.29602 ± 239.963
2025-05-11 00:36:12,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [849.5493, 886.4447, 873.2234, 167.72615, 895.8527, 852.51666, 885.1689, 898.0471, 879.4263, 425.00418]
2025-05-11 00:36:12,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 177.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 447.0]
2025-05-11 00:36:12,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 52 minutes, 55 seconds)
2025-05-11 00:39:23,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:39:41,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 730.26105 ± 311.429
2025-05-11 00:39:41,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [864.9939, 92.06833, 884.4042, 905.19165, 896.4146, 882.4897, 898.9726, 870.7501, 883.399, 123.92651]
2025-05-11 00:39:41,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 70.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 108.0]
2025-05-11 00:39:41,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 52 minutes, 30 seconds)
2025-05-11 00:42:32,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:42:49,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 685.92505 ± 329.463
2025-05-11 00:42:49,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [876.4053, 882.6699, 844.58746, 879.9387, 25.0351, 840.34296, 691.0728, 863.3698, 907.437, 48.391438]
2025-05-11 00:42:49,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 30.0, 1000.0, 1000.0, 1000.0, 1000.0, 50.0]
2025-05-11 00:42:49,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 45 minutes, 25 seconds)
2025-05-11 00:45:50,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:46:07,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 710.12097 ± 323.019
2025-05-11 00:46:07,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [849.77637, 39.734173, 876.8579, 867.6037, 890.8873, 854.18787, 90.068016, 876.0624, 871.6847, 884.3476]
2025-05-11 00:46:07,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 41.0, 1000.0, 1000.0, 1000.0, 1000.0, 97.0, 1000.0, 1000.0, 1000.0]
2025-05-11 00:46:07,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 44 minutes, 7 seconds)
2025-05-11 00:49:04,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:49:19,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 592.16577 ± 325.410
2025-05-11 00:49:19,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [146.30835, 174.06226, 924.4936, 655.71356, 654.4033, 701.49396, 66.60234, 988.22754, 930.0376, 680.3156]
2025-05-11 00:49:19,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [190.0, 196.0, 1000.0, 1000.0, 1000.0, 1000.0, 85.0, 1000.0, 1000.0, 715.0]
2025-05-11 00:49:19,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 41 minutes, 17 seconds)
2025-05-11 00:52:23,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:52:42,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 782.32257 ± 259.936
2025-05-11 00:52:42,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [901.54553, 882.9039, 888.0936, 925.95483, 647.92365, 887.1192, 872.9228, 900.4359, 34.986237, 881.3396]
2025-05-11 00:52:42,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 41.0, 1000.0]
2025-05-11 00:52:42,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 38 minutes, 16 seconds)
2025-05-11 00:55:43,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:55:53,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 511.71088 ± 376.548
2025-05-11 00:55:53,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [943.0987, 64.67467, 271.8613, 81.3378, 918.13086, 918.876, 299.44794, 1007.08636, 102.09227, 510.50308]
2025-05-11 00:55:53,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 110.0, 275.0, 61.0, 1000.0, 1000.0, 259.0, 1000.0, 76.0, 519.0]
2025-05-11 00:55:53,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 32 minutes, 24 seconds)
2025-05-11 00:59:07,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 00:59:22,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 657.00958 ± 409.190
2025-05-11 00:59:22,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [34.692684, 979.0911, 879.97687, 38.41841, 852.1198, 900.9658, 922.85864, 32.517258, 923.78107, 1005.6736]
2025-05-11 00:59:22,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [35.0, 1000.0, 1000.0, 34.0, 1000.0, 1000.0, 1000.0, 34.0, 1000.0, 1000.0]
2025-05-11 00:59:22,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 32 minutes, 18 seconds)
2025-05-11 01:02:15,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:02:25,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 513.19708 ± 294.674
2025-05-11 01:02:25,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [439.3003, 798.01416, 35.000114, 891.2565, 204.96953, 733.6189, 476.40317, 107.84621, 803.5307, 642.03125]
2025-05-11 01:02:25,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [384.0, 1000.0, 45.0, 1000.0, 233.0, 643.0, 403.0, 99.0, 1000.0, 573.0]
2025-05-11 01:02:25,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 26 minutes, 44 seconds)
2025-05-11 01:05:31,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:05:52,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 870.26111 ± 12.518
2025-05-11 01:05:52,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [886.9547, 868.8251, 874.36383, 861.67316, 857.1993, 865.80304, 871.3035, 893.4069, 874.1171, 848.96387]
2025-05-11 01:05:52,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:05:52,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (870.26) for latency MM1Queue_a033_s075
2025-05-11 01:05:52,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:05:52,080 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 01:05:52,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 25 minutes, 37 seconds)
2025-05-11 01:08:51,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:09:07,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 765.72327 ± 373.276
2025-05-11 01:09:07,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [946.2587, 39.23488, 702.5652, 949.16364, 1075.941, 1049.3292, 47.902637, 989.4208, 939.4644, 917.9521]
2025-05-11 01:09:07,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 48.0, 1000.0, 1000.0, 1000.0, 1000.0, 40.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:09:07,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 21 minutes, 10 seconds)
2025-05-11 01:12:08,998 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:12:25,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 729.78314 ± 336.382
2025-05-11 01:12:25,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [923.2661, 887.3036, 882.5777, 902.0846, 861.1336, 890.49225, 915.1474, 57.43863, 919.8881, 58.49933]
2025-05-11 01:12:25,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 69.0, 1000.0, 50.0]
2025-05-11 01:12:25,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 18 minutes, 51 seconds)
2025-05-11 01:15:27,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:15:48,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 853.25372 ± 69.802
2025-05-11 01:15:48,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [654.85254, 889.1622, 881.54895, 868.2596, 894.83856, 861.2696, 920.62604, 833.2336, 870.0127, 858.7337]
2025-05-11 01:15:48,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:15:48,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 14 minutes, 45 seconds)
2025-05-11 01:18:52,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:19:11,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 855.45428 ± 229.466
2025-05-11 01:19:11,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [893.7948, 939.8787, 895.11865, 995.6588, 177.18806, 934.3857, 895.76196, 890.342, 1005.3729, 927.04144]
2025-05-11 01:19:11,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 180.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:19:11,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 14 minutes, 4 seconds)
2025-05-11 01:21:57,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:22:15,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 798.02277 ± 217.266
2025-05-11 01:22:15,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [856.3129, 976.65814, 950.9433, 965.695, 900.8715, 914.58777, 752.8207, 360.8215, 900.26996, 401.24655]
2025-05-11 01:22:15,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 372.0, 1000.0, 412.0]
2025-05-11 01:22:15,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 7 minutes, 51 seconds)
2025-05-11 01:25:19,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:25:36,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 764.85449 ± 308.097
2025-05-11 01:25:36,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [888.5392, 905.4735, 892.4888, 957.55, 953.48004, 906.83044, 925.8775, 33.05844, 289.12424, 896.123]
2025-05-11 01:25:36,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 34.0, 299.0, 1000.0]
2025-05-11 01:25:36,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 5 minutes, 22 seconds)
2025-05-11 01:28:37,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:28:58,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 919.52716 ± 27.224
2025-05-11 01:28:58,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [865.97974, 932.94977, 928.8208, 912.8838, 952.24927, 940.0746, 951.9321, 892.2921, 928.1379, 889.9511]
2025-05-11 01:28:58,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:28:58,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (919.53) for latency MM1Queue_a033_s075
2025-05-11 01:28:58,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 01:28:58,741 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 01:28:58,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 2 minutes, 28 seconds)
2025-05-11 01:32:07,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:32:26,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 854.30859 ± 178.853
2025-05-11 01:32:26,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [816.6537, 857.83185, 336.96405, 927.9833, 946.2742, 948.17334, 873.0354, 960.50653, 902.2571, 973.4063]
2025-05-11 01:32:26,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 324.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:32:26,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 59 minutes, 46 seconds)
2025-05-11 01:35:27,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:35:45,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 775.48438 ± 300.054
2025-05-11 01:35:45,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [276.53183, 933.9832, 90.9729, 909.06903, 919.3805, 896.6637, 981.6521, 943.85596, 870.4669, 932.26764]
2025-05-11 01:35:45,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [242.0, 1000.0, 104.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:35:45,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2025-05-11 01:38:48,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:39:03,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 642.08966 ± 385.550
2025-05-11 01:39:03,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [910.01404, 881.90845, 24.131657, 909.4623, 836.96136, 990.50305, 797.0266, 114.97218, 918.8271, 37.08986]
2025-05-11 01:39:03,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 37.0, 1000.0, 1000.0, 1000.0, 1000.0, 114.0, 1000.0, 46.0]
2025-05-11 01:39:03,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 54 minutes, 10 seconds)
2025-05-11 01:41:57,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:42:16,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 834.34473 ± 283.219
2025-05-11 01:42:16,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [965.9938, 5.6550713, 745.49347, 939.7478, 948.6733, 931.8378, 953.4624, 949.14307, 923.69055, 979.75037]
2025-05-11 01:42:16,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 17.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:42:16,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 49 minutes, 59 seconds)
2025-05-11 01:45:18,209 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:45:33,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 719.15662 ± 356.782
2025-05-11 01:45:33,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [626.2073, 98.9086, 307.59747, 960.7248, 1072.827, 937.366, 1006.6716, 880.41486, 225.55194, 1075.2963]
2025-05-11 01:45:33,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 72.0, 280.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 214.0, 1000.0]
2025-05-11 01:45:33,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 46 minutes, 7 seconds)
2025-05-11 01:48:37,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:48:56,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 818.56964 ± 272.738
2025-05-11 01:48:56,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [977.4865, 915.0517, 957.9852, 807.163, 685.4272, 972.7262, 932.9426, 959.5556, 934.5626, 42.796135]
2025-05-11 01:48:56,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 53.0]
2025-05-11 01:48:56,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 42 minutes, 18 seconds)
2025-05-11 01:51:56,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:52:09,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 699.33655 ± 384.105
2025-05-11 01:52:09,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [974.7759, 1084.3842, 1082.9476, 1040.5266, 467.50122, 379.0819, 1050.1307, 50.95437, 700.2913, 162.77138]
2025-05-11 01:52:09,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 390.0, 339.0, 1000.0, 52.0, 701.0, 160.0]
2025-05-11 01:52:09,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 38 minutes, 23 seconds)
2025-05-11 01:54:59,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:55:18,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 906.34979 ± 220.175
2025-05-11 01:55:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [313.6731, 969.3267, 824.57465, 993.528, 1123.25, 1020.1508, 1035.0657, 784.1119, 1055.7681, 944.04974]
2025-05-11 01:55:18,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [282.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 01:55:18,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 34 minutes, 18 seconds)
2025-05-11 01:58:20,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 01:58:34,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 694.24774 ± 451.084
2025-05-11 01:58:34,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [558.83563, 81.86008, 953.26605, 980.1122, 1116.6439, 1031.9446, 1010.52435, 1154.0101, 27.987839, 27.29282]
2025-05-11 01:58:34,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [521.0, 87.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 35.0, 35.0]
2025-05-11 01:58:34,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 31 minutes, 12 seconds)
2025-05-11 02:01:36,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:01:56,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 950.24628 ± 141.858
2025-05-11 02:01:56,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [935.26746, 905.1163, 1129.5042, 1024.8204, 1022.33484, 700.35144, 1104.0137, 954.21344, 1029.1018, 697.73883]
2025-05-11 02:01:56,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 946.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:01:56,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (950.25) for latency MM1Queue_a033_s075
2025-05-11 02:01:56,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 02:01:56,777 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 02:01:56,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 28 minutes, 28 seconds)
2025-05-11 02:04:56,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:05:15,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 891.54382 ± 271.804
2025-05-11 02:05:15,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [935.56854, 1074.8134, 1074.9751, 1032.0245, 946.0647, 983.7297, 1116.7188, 139.54535, 744.53723, 867.4614]
2025-05-11 02:05:15,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 139.0, 1000.0, 1000.0]
2025-05-11 02:05:15,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 24 minutes, 47 seconds)
2025-05-11 02:08:13,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:08:33,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1007.95642 ± 33.803
2025-05-11 02:08:33,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [966.93524, 942.45605, 1032.8577, 1006.5688, 1023.687, 1012.6726, 1052.0491, 1045.0259, 1021.78973, 975.5229]
2025-05-11 02:08:33,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 848.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:08:33,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (1007.96) for latency MM1Queue_a033_s075
2025-05-11 02:08:33,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 02:08:33,324 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 02:08:33,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 22 minutes)
2025-05-11 02:11:43,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:12:01,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 821.40613 ± 271.876
2025-05-11 02:12:01,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [51.856808, 967.9537, 973.94403, 987.4472, 963.7101, 941.2379, 728.19403, 939.48535, 922.99243, 737.2398]
2025-05-11 02:12:01,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:12:01,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 20 minutes, 15 seconds)
2025-05-11 02:14:58,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:15:14,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 773.76886 ± 392.478
2025-05-11 02:15:14,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1119.5701, 1109.6858, 660.66516, 1027.004, 961.3934, 833.8822, 31.098907, 922.0677, 33.285816, 1039.0358]
2025-05-11 02:15:14,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [978.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 26.0, 1000.0, 31.0, 1000.0]
2025-05-11 02:15:14,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 16 minutes, 42 seconds)
2025-05-11 02:18:19,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:18:33,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 684.01062 ± 340.062
2025-05-11 02:18:33,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [700.9853, 864.89496, 1060.8549, 995.0786, 293.05713, 961.5376, 864.01135, 819.81287, 62.80632, 217.06688]
2025-05-11 02:18:33,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 926.0, 239.0, 1000.0, 1000.0, 729.0, 50.0, 187.0]
2025-05-11 02:18:33,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 13 minutes, 6 seconds)
2025-05-11 02:21:30,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:21:48,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 851.36346 ± 289.631
2025-05-11 02:21:48,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [15.447958, 971.93646, 963.52905, 1009.44507, 987.95825, 943.7636, 931.10846, 717.52844, 982.55975, 990.3577]
2025-05-11 02:21:48,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [39.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:21:48,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 9 minutes, 33 seconds)
2025-05-11 02:24:51,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:25:09,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 845.17578 ± 319.904
2025-05-11 02:25:09,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [684.6529, 817.50653, 1121.1405, 833.5819, 674.446, 1181.5106, 1243.1881, 65.62047, 880.374, 949.737]
2025-05-11 02:25:09,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [548.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 67.0, 1000.0, 1000.0]
2025-05-11 02:25:09,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 6 minutes, 23 seconds)
2025-05-11 02:28:17,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:28:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 847.11035 ± 351.270
2025-05-11 02:28:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [201.85814, 930.7122, 1102.7285, 1093.1163, 889.8209, 132.3736, 1103.5879, 863.4129, 1060.619, 1092.874]
2025-05-11 02:28:33,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [198.0, 829.0, 1000.0, 1000.0, 1000.0, 113.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:28:33,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 2 minutes, 48 seconds)
2025-05-11 02:31:23,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:31:41,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 840.55029 ± 240.461
2025-05-11 02:31:41,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1131.9692, 999.4168, 710.84906, 379.03125, 755.4991, 1063.7943, 783.6153, 1042.5398, 1021.3767, 517.4115]
2025-05-11 02:31:41,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 965.0, 1000.0, 381.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 459.0]
2025-05-11 02:31:41,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 59 minutes, 13 seconds)
2025-05-11 02:34:40,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:34:57,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 805.03595 ± 384.904
2025-05-11 02:34:57,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [711.33563, 1134.3213, 877.2934, 118.31017, 739.0575, 1175.2156, 117.021965, 1173.9382, 830.00714, 1173.8577]
2025-05-11 02:34:57,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 709.0, 114.0, 1000.0, 1000.0, 92.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:34:57,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 55 minutes, 43 seconds)
2025-05-11 02:37:58,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:38:16,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 909.83118 ± 291.594
2025-05-11 02:38:16,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1019.7194, 987.60504, 1047.1675, 915.2833, 1084.1544, 1018.84717, 45.099102, 1031.4131, 980.6026, 968.41956]
2025-05-11 02:38:16,562 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 47.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:38:16,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 52 minutes, 40 seconds)
2025-05-11 02:41:18,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:41:38,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1056.15015 ± 211.705
2025-05-11 02:41:38,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1103.6753, 1205.6421, 739.7073, 1169.5073, 1283.5448, 1279.1066, 823.4979, 749.29956, 935.88873, 1271.6321]
2025-05-11 02:41:38,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:41:38,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (1056.15) for latency MM1Queue_a033_s075
2025-05-11 02:41:38,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 02:41:38,631 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 02:41:38,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 49 minutes, 28 seconds)
2025-05-11 02:44:36,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:44:54,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1000.49353 ± 208.201
2025-05-11 02:44:54,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [795.2732, 686.3132, 1313.9365, 903.20886, 1278.107, 1053.3445, 1027.7389, 1244.7883, 854.74457, 847.4804]
2025-05-11 02:44:54,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 712.0, 1000.0, 1000.0, 1000.0, 1000.0, 693.0, 707.0]
2025-05-11 02:44:55,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 45 minutes, 47 seconds)
2025-05-11 02:47:58,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:48:15,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 925.11902 ± 289.061
2025-05-11 02:48:15,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [910.7238, 526.4287, 1104.3357, 1169.6509, 1146.9783, 1056.9077, 1092.3434, 242.41136, 1079.4939, 921.9173]
2025-05-11 02:48:15,458 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 421.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 235.0, 1000.0, 887.0]
2025-05-11 02:48:15,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 43 minutes, 4 seconds)
2025-05-11 02:51:25,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:51:41,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1018.35370 ± 385.962
2025-05-11 02:51:41,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [854.0743, 1180.4454, 1171.1157, 1284.7576, 1322.9913, 596.98065, 56.36309, 1252.4915, 1226.4983, 1237.8192]
2025-05-11 02:51:41,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [645.0, 1000.0, 1000.0, 1000.0, 1000.0, 471.0, 56.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:51:41,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 40 minutes, 10 seconds)
2025-05-11 02:54:40,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:54:56,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 886.92902 ± 314.225
2025-05-11 02:54:56,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [655.5211, 1223.7034, 1005.8326, 1190.506, 781.01733, 1185.7283, 565.2333, 241.32169, 1187.576, 832.851]
2025-05-11 02:54:56,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 887.0, 1000.0, 733.0, 1000.0, 500.0, 198.0, 983.0, 1000.0]
2025-05-11 02:54:56,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 36 minutes, 39 seconds)
2025-05-11 02:57:54,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 02:58:08,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 870.45862 ± 401.508
2025-05-11 02:58:08,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [426.90607, 58.986496, 1152.4811, 1121.453, 1242.794, 490.75027, 681.2084, 1175.5476, 1092.3695, 1262.0896]
2025-05-11 02:58:08,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [345.0, 52.0, 1000.0, 1000.0, 1000.0, 368.0, 553.0, 1000.0, 1000.0, 1000.0]
2025-05-11 02:58:08,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 58 seconds)
2025-05-11 03:01:17,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:01:37,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1158.57251 ± 89.954
2025-05-11 03:01:37,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1209.7864, 926.2045, 1182.2999, 1235.2524, 1100.5729, 1174.6746, 1120.2465, 1186.6548, 1185.3519, 1264.6815]
2025-05-11 03:01:37,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:01:37,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (1158.57) for latency MM1Queue_a033_s075
2025-05-11 03:01:37,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-11 03:01:37,836 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 03:01:37,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 5 seconds)
2025-05-11 03:04:26,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:04:42,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1002.94940 ± 390.611
2025-05-11 03:04:42,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1092.5992, 478.58408, 1217.3251, 1193.134, 1153.3247, 1189.4377, 1269.308, 1182.0648, 1225.748, 27.968895]
2025-05-11 03:04:42,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [911.0, 382.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 28.0]
2025-05-11 03:04:42,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 19 seconds)
2025-05-11 03:07:40,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:07:57,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1053.55542 ± 246.712
2025-05-11 03:07:57,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1119.5096, 1230.0404, 1202.7706, 455.71384, 1185.8118, 1150.16, 750.47705, 1256.4353, 1216.104, 968.53314]
2025-05-11 03:07:57,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 383.0, 1000.0, 1000.0, 677.0, 1000.0, 1000.0, 836.0]
2025-05-11 03:07:57,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 46 seconds)
2025-05-11 03:10:58,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:11:14,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 868.58221 ± 433.666
2025-05-11 03:11:14,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1143.3516, 63.737747, 1119.8214, 1149.7178, 1074.374, 1162.0498, 32.267506, 1107.8114, 1166.8903, 665.80096]
2025-05-11 03:11:14,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 60.0, 1000.0, 1000.0, 1000.0, 1000.0, 36.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:11:14,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 34 seconds)
2025-05-11 03:14:17,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:14:35,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1063.81409 ± 348.392
2025-05-11 03:14:35,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1143.3948, 1129.7228, 1187.6605, 1237.2207, 24.041655, 1204.0762, 1237.2917, 1160.2024, 1161.414, 1153.1157]
2025-05-11 03:14:35,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 31.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:14:35,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 27 seconds)
2025-05-11 03:17:37,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:17:58,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1044.98486 ± 81.761
2025-05-11 03:17:58,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1063.5338, 1056.2329, 1075.4131, 1062.3838, 818.05225, 1049.3495, 1125.4012, 1118.5717, 1069.6841, 1011.2275]
2025-05-11 03:17:58,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:17:58,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 4 seconds)
2025-05-11 03:21:02,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 939.27185 ± 291.603
2025-05-11 03:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1180.2999, 237.49924, 1103.775, 913.4975, 610.56323, 1171.4576, 1182.1499, 923.6189, 907.4128, 1162.4443]
2025-05-11 03:21:20,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 174.0, 884.0, 1000.0, 484.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:21:20,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 58 seconds)
2025-05-11 03:24:40,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:25:00,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1077.63000 ± 230.943
2025-05-11 03:25:00,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1155.4951, 1286.5393, 448.2772, 995.6531, 1199.5833, 950.86835, 1208.505, 1185.5526, 1134.4434, 1211.382]
2025-05-11 03:25:00,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 352.0, 1000.0, 1000.0, 747.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:25:00,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 48 seconds)
2025-05-11 03:28:10,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:28:29,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1004.70178 ± 272.802
2025-05-11 03:28:29,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1185.5348, 824.0052, 1288.7549, 1196.7921, 1235.272, 1201.8911, 395.84872, 946.4543, 703.11786, 1069.3472]
2025-05-11 03:28:29,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 929.0, 1000.0, 1000.0, 331.0, 786.0, 1000.0, 860.0]
2025-05-11 03:28:29,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 26 seconds)
2025-05-11 03:31:51,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:32:13,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 952.06604 ± 195.810
2025-05-11 03:32:13,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [1121.025, 1068.3341, 762.7139, 1067.642, 1061.608, 1052.5627, 1090.98, 707.73883, 1059.633, 528.42267]
2025-05-11 03:32:13,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 730.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 03:32:13,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1251 [DEBUG]: Training session finished
