2025-05-07 11:27:31,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4
2025-05-07 11:27:32,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4
2025-05-07 11:27:32,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7e7f5c9c3f10>}
2025-05-07 11:27:32,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1009 [DEBUG]: using device: cpu
2025-05-07 11:27:32,000 baseline-sac-noisy-ant:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 32
2025-05-07 11:27:32,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1031 [INFO]: Creating new trainer
2025-05-07 11:27:32,017 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-07 11:27:32,018 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=67, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-07 11:27:32,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1092 [DEBUG]: Starting training session...
2025-05-07 11:27:32,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 1/100
2025-05-07 11:30:15,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:30:25,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -1107.65344 ± 1091.297
2025-05-07 11:30:25,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-2208.5344, -17.647024, -17.510275, -17.76506, -14.862855, -2213.9304, -20.043207, -2079.717, -2306.13, -2180.3938]
2025-05-07 11:30:25,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 15.0, 19.0, 21.0, 16.0, 1000.0, 22.0, 1000.0, 1000.0, 1000.0]
2025-05-07 11:30:25,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-1107.65) for latency ExtremeSparseL4U32
2025-05-07 11:30:25,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:30:25,395 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:30:25,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 45 minutes, 44 seconds)
2025-05-07 11:33:36,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:33:39,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -101.69664 ± 157.316
2025-05-07 11:33:39,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-68.579865, -221.03584, -0.89297247, -15.921202, -124.198204, -64.444756, -2.4222302, -6.247186, 13.35864, -526.5828]
2025-05-07 11:33:39,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [151.0, 214.0, 22.0, 36.0, 203.0, 54.0, 22.0, 117.0, 49.0, 1000.0]
2025-05-07 11:33:39,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-101.70) for latency ExtremeSparseL4U32
2025-05-07 11:33:39,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:33:39,532 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:33:39,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 59 minutes, 58 seconds)
2025-05-07 11:36:38,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:36:41,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -51.64338 ± 144.153
2025-05-07 11:36:41,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-7.9776826, 0.7340766, -1.640756, -1.1271707, -91.49541, 6.5963655, -17.159914, -24.512781, -466.46664, 86.61616]
2025-05-07 11:36:41,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [44.0, 13.0, 26.0, 18.0, 131.0, 71.0, 172.0, 61.0, 1000.0, 192.0]
2025-05-07 11:36:41,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-51.64) for latency ExtremeSparseL4U32
2025-05-07 11:36:41,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:36:41,073 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:36:41,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 55 minutes, 46 seconds)
2025-05-07 11:39:22,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:39:24,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -13.89268 ± 19.669
2025-05-07 11:39:24,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-36.89115, -5.9610796, -35.747894, -11.643281, 6.3126616, 27.142578, -5.397291, -31.848991, -31.420591, -13.471763]
2025-05-07 11:39:24,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [234.0, 29.0, 173.0, 266.0, 30.0, 102.0, 144.0, 65.0, 264.0, 40.0]
2025-05-07 11:39:24,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (-13.89) for latency ExtremeSparseL4U32
2025-05-07 11:39:24,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:39:24,653 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:39:24,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 44 minutes, 58 seconds)
2025-05-07 11:42:31,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:42:34,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 35.96136 ± 21.903
2025-05-07 11:42:34,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [38.316708, 42.1255, 20.93832, 35.994896, 28.687443, -8.773271, 18.097301, 71.784615, 61.266273, 51.17581]
2025-05-07 11:42:34,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [147.0, 317.0, 61.0, 314.0, 47.0, 263.0, 46.0, 181.0, 224.0, 78.0]
2025-05-07 11:42:34,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (35.96) for latency ExtremeSparseL4U32
2025-05-07 11:42:34,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 11:42:34,484 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 11:42:34,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 45 minutes, 42 seconds)
2025-05-07 11:45:21,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:45:26,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 15.97577 ± 32.480
2025-05-07 11:45:26,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-6.205304, -21.210217, -38.345005, 46.23597, 46.49636, 64.6943, 8.696149, 14.924087, -3.9452107, 48.416595]
2025-05-07 11:45:26,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [63.0, 261.0, 195.0, 121.0, 136.0, 1000.0, 581.0, 239.0, 302.0, 185.0]
2025-05-07 11:45:26,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 42 minutes, 21 seconds)
2025-05-07 11:48:20,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:48:24,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 1.45377 ± 37.195
2025-05-07 11:48:24,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [1.377287, 8.80616, 54.402294, -10.592338, 29.032768, -47.93919, 19.379032, -76.9845, 35.476246, 1.5799272]
2025-05-07 11:48:24,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [114.0, 184.0, 284.0, 948.0, 239.0, 270.0, 151.0, 312.0, 74.0, 56.0]
2025-05-07 11:48:24,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 34 minutes, 20 seconds)
2025-05-07 11:51:18,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:51:25,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -44.00981 ± 41.227
2025-05-07 11:51:25,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-24.485218, -148.51334, -4.618005, -45.56576, -18.9309, -79.375694, -26.629255, -53.63042, -1.3888923, -36.960617]
2025-05-07 11:51:25,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [316.0, 1000.0, 15.0, 1000.0, 57.0, 1000.0, 104.0, 260.0, 15.0, 119.0]
2025-05-07 11:51:25,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 31 minutes, 17 seconds)
2025-05-07 11:54:24,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:54:30,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 21.31784 ± 28.506
2025-05-07 11:54:30,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [13.001783, 76.9144, 13.358566, 39.332745, -31.961405, 4.458733, 18.529953, 14.7736635, 8.309262, 56.460655]
2025-05-07 11:54:30,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [148.0, 276.0, 83.0, 143.0, 1000.0, 691.0, 47.0, 55.0, 1000.0, 101.0]
2025-05-07 11:54:30,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 34 minutes, 54 seconds)
2025-05-07 11:57:36,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:57:41,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: -1.83381 ± 22.368
2025-05-07 11:57:41,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [28.568867, -7.3166766, 7.76019, 13.017801, -43.843147, -8.348402, -33.592255, 9.272437, 25.403662, -9.260545]
2025-05-07 11:57:41,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 213.0, 173.0, 181.0, 1000.0, 134.0, 153.0, 57.0, 151.0, 61.0]
2025-05-07 11:57:41,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 32 minutes, 13 seconds)
2025-05-07 12:00:30,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:00:34,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 25.86533 ± 60.243
2025-05-07 12:00:34,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [-9.263636, 1.5783213, 86.12177, -69.88527, 166.03441, 1.0971167, 15.572722, 51.310375, 11.144868, 4.942571]
2025-05-07 12:00:34,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [148.0, 23.0, 1000.0, 427.0, 474.0, 29.0, 132.0, 136.0, 39.0, 54.0]
2025-05-07 12:00:34,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 29 minutes, 19 seconds)
2025-05-07 12:03:22,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:03:25,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 26.54752 ± 31.792
2025-05-07 12:03:25,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [45.093224, 55.395992, 43.252754, -8.948159, -31.075985, 14.896439, 5.7532616, 16.231594, 42.181984, 82.6941]
2025-05-07 12:03:25,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [130.0, 219.0, 117.0, 261.0, 337.0, 42.0, 94.0, 50.0, 181.0, 351.0]
2025-05-07 12:03:25,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 24 minutes, 21 seconds)
2025-05-07 12:06:33,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:06:38,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 33.23283 ± 84.474
2025-05-07 12:06:38,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [36.67136, 11.3439455, -15.427197, -73.0192, 24.408045, 98.81908, 176.54384, -52.04542, -45.432396, 170.46623]
2025-05-07 12:06:38,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [204.0, 35.0, 93.0, 167.0, 162.0, 312.0, 1000.0, 301.0, 131.0, 1000.0]
2025-05-07 12:06:38,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 24 minutes, 49 seconds)
2025-05-07 12:09:27,895 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:09:34,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 88.00305 ± 147.419
2025-05-07 12:09:34,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [123.37276, -1.6610568, -16.216501, -36.135624, 7.6666293, 40.360462, 13.237758, 20.50946, 435.3307, 293.56592]
2025-05-07 12:09:34,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 115.0, 103.0, 63.0, 24.0, 346.0, 51.0, 184.0, 1000.0, 1000.0]
2025-05-07 12:09:34,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (88.00) for latency ExtremeSparseL4U32
2025-05-07 12:09:34,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:09:34,778 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:09:34,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 19 minutes, 5 seconds)
2025-05-07 12:12:34,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:12:42,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 103.05135 ± 141.043
2025-05-07 12:12:42,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [122.21622, -35.70733, 384.46033, 125.12672, 326.296, 20.70882, -52.831905, -23.99995, 53.013985, 111.23056]
2025-05-07 12:12:42,567 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [296.0, 350.0, 1000.0, 502.0, 1000.0, 144.0, 126.0, 1000.0, 124.0, 430.0]
2025-05-07 12:12:42,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (103.05) for latency ExtremeSparseL4U32
2025-05-07 12:12:42,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:12:42,577 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:12:42,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 15 minutes, 11 seconds)
2025-05-07 12:15:30,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:15:34,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 79.02321 ± 121.886
2025-05-07 12:15:34,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [296.3407, 68.2363, -10.40945, 15.565667, 19.62282, 26.8828, -6.094307, -7.2690516, 338.52997, 48.826675]
2025-05-07 12:15:34,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 108.0, 19.0, 57.0, 29.0, 122.0, 124.0, 54.0, 1000.0, 101.0]
2025-05-07 12:15:34,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 12 minutes, 4 seconds)
2025-05-07 12:18:23,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:18:28,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 105.62346 ± 125.130
2025-05-07 12:18:28,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [16.869457, 108.400696, -0.2616476, 77.73002, 353.69467, 172.379, 18.070126, -13.601569, 15.555735, 307.39807]
2025-05-07 12:18:28,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [50.0, 280.0, 36.0, 80.0, 1000.0, 522.0, 110.0, 48.0, 159.0, 783.0]
2025-05-07 12:18:28,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (105.62) for latency ExtremeSparseL4U32
2025-05-07 12:18:28,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:18:28,411 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:18:28,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 9 minutes, 44 seconds)
2025-05-07 12:21:22,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:21:28,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 107.93235 ± 131.540
2025-05-07 12:21:28,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [56.112217, 394.51736, 61.912945, 11.165395, 63.921257, 336.02856, 78.55994, -6.509283, 47.33809, 36.27692]
2025-05-07 12:21:28,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [81.0, 1000.0, 155.0, 18.0, 302.0, 1000.0, 283.0, 93.0, 407.0, 311.0]
2025-05-07 12:21:28,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (107.93) for latency ExtremeSparseL4U32
2025-05-07 12:21:28,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:21:28,423 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:21:28,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 3 minutes, 8 seconds)
2025-05-07 12:24:17,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:24:24,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 90.21323 ± 103.769
2025-05-07 12:24:24,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [51.027218, 247.56728, 14.974115, 58.552288, 68.40632, 161.25754, 21.595793, -3.4214723, -17.888597, 300.0619]
2025-05-07 12:24:24,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [143.0, 1000.0, 291.0, 135.0, 320.0, 687.0, 36.0, 46.0, 82.0, 1000.0]
2025-05-07 12:24:24,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 9 seconds)
2025-05-07 12:27:27,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:27:33,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 124.57214 ± 140.810
2025-05-07 12:27:33,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [37.22752, 36.9886, 40.453796, 3.7125673, 93.85477, 98.918396, 310.757, 151.278, 459.1565, 13.374232]
2025-05-07 12:27:33,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [113.0, 104.0, 45.0, 14.0, 281.0, 276.0, 1000.0, 394.0, 1000.0, 49.0]
2025-05-07 12:27:33,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (124.57) for latency ExtremeSparseL4U32
2025-05-07 12:27:33,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:27:33,287 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:27:33,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 57 minutes, 31 seconds)
2025-05-07 12:30:20,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:30:30,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 212.89893 ± 184.609
2025-05-07 12:30:30,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [37.376118, 299.2686, 43.712696, 381.6675, 34.969376, 49.216846, 464.6271, 476.81613, 11.935078, 329.39987]
2025-05-07 12:30:30,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [77.0, 1000.0, 75.0, 1000.0, 33.0, 78.0, 1000.0, 1000.0, 25.0, 1000.0]
2025-05-07 12:30:30,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (212.90) for latency ExtremeSparseL4U32
2025-05-07 12:30:30,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:30:30,385 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:30:30,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 55 minutes, 52 seconds)
2025-05-07 12:33:26,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:33:32,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 126.21684 ± 150.515
2025-05-07 12:33:32,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [161.659, 288.43677, 38.543266, 40.193913, 106.63672, 506.03827, 15.059817, 21.840809, 71.59715, 12.162677]
2025-05-07 12:33:32,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [315.0, 1000.0, 37.0, 175.0, 158.0, 1000.0, 18.0, 54.0, 347.0, 50.0]
2025-05-07 12:33:32,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 54 minutes, 58 seconds)
2025-05-07 12:36:15,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:36:24,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 232.43901 ± 199.800
2025-05-07 12:36:24,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [526.1462, 516.10077, 401.776, 84.656746, 77.521515, 15.653965, 415.3679, 52.082874, 21.18561, 213.89867]
2025-05-07 12:36:24,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 103.0, 255.0, 261.0, 1000.0, 78.0, 31.0, 523.0]
2025-05-07 12:36:24,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (232.44) for latency ExtremeSparseL4U32
2025-05-07 12:36:24,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:36:24,789 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:36:24,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 50 minutes, 4 seconds)
2025-05-07 12:39:19,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:39:23,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 109.63104 ± 134.680
2025-05-07 12:39:23,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [118.92613, 25.945835, 142.7379, 30.766958, 148.79172, 14.706842, 81.83012, 58.096046, 482.646, -8.13706]
2025-05-07 12:39:23,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [174.0, 63.0, 309.0, 108.0, 391.0, 29.0, 116.0, 85.0, 1000.0, 193.0]
2025-05-07 12:39:23,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 47 minutes, 42 seconds)
2025-05-07 12:42:16,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:42:29,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 342.01202 ± 188.314
2025-05-07 12:42:29,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [417.38055, 70.08224, 518.1192, 24.13479, 98.17609, 425.38257, 494.49014, 387.8401, 547.1773, 437.33704]
2025-05-07 12:42:29,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 95.0, 1000.0, 42.0, 154.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:42:29,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (342.01) for latency ExtremeSparseL4U32
2025-05-07 12:42:29,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:42:29,702 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:42:29,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 44 minutes, 6 seconds)
2025-05-07 12:45:22,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:45:34,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 380.72781 ± 235.997
2025-05-07 12:45:34,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [745.873, 433.88806, 227.01904, 33.837254, 581.30585, 469.446, 469.25876, 635.4018, 153.93819, 57.310078]
2025-05-07 12:45:34,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 283.0, 31.0, 1000.0, 1000.0, 1000.0, 1000.0, 433.0, 148.0]
2025-05-07 12:45:34,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (380.73) for latency ExtremeSparseL4U32
2025-05-07 12:45:34,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:45:34,334 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:45:34,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 42 minutes, 58 seconds)
2025-05-07 12:48:34,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:48:46,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 444.53897 ± 210.071
2025-05-07 12:48:46,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [461.9359, 265.49884, 568.1051, 526.4595, 556.982, 698.0778, 48.001522, 297.563, 758.62274, 264.14365]
2025-05-07 12:48:46,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [661.0, 419.0, 1000.0, 1000.0, 1000.0, 1000.0, 81.0, 399.0, 1000.0, 371.0]
2025-05-07 12:48:46,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (444.54) for latency ExtremeSparseL4U32
2025-05-07 12:48:46,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:48:46,879 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:48:46,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 42 minutes, 34 seconds)
2025-05-07 12:51:36,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:51:49,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 470.49652 ± 264.088
2025-05-07 12:51:49,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [545.52405, 639.83154, 579.6579, 671.6814, 652.45636, 100.83147, 45.263958, 78.09452, 647.6928, 743.93146]
2025-05-07 12:51:49,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 279.0, 116.0, 139.0, 1000.0, 1000.0]
2025-05-07 12:51:49,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (470.50) for latency ExtremeSparseL4U32
2025-05-07 12:51:49,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:51:49,949 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:51:49,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 42 minutes, 2 seconds)
2025-05-07 12:54:55,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:55:12,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 611.03955 ± 182.673
2025-05-07 12:55:12,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [694.4247, 697.15424, 679.2018, 720.18427, 671.6376, 695.44916, 687.2559, 77.43602, 573.37885, 614.2736]
2025-05-07 12:55:12,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 90.0, 1000.0, 1000.0]
2025-05-07 12:55:12,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (611.04) for latency ExtremeSparseL4U32
2025-05-07 12:55:12,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 12:55:12,985 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:55:12,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 44 minutes, 48 seconds)
2025-05-07 12:57:57,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:58:11,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 512.16663 ± 253.473
2025-05-07 12:58:11,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [28.022959, 33.431225, 661.58325, 683.1534, 696.3137, 662.2356, 404.5227, 649.4579, 680.2879, 622.6578]
2025-05-07 12:58:11,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [42.0, 47.0, 1000.0, 1000.0, 1000.0, 1000.0, 613.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:58:11,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 39 minutes, 47 seconds)
2025-05-07 13:01:02,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:01:19,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 595.73883 ± 180.999
2025-05-07 13:01:19,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [729.6326, 393.19046, 637.2376, 124.4465, 682.998, 671.0015, 711.2979, 647.3019, 705.40594, 654.87604]
2025-05-07 13:01:19,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 739.0, 1000.0, 175.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:01:19,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 37 minutes, 18 seconds)
2025-05-07 13:04:19,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:04:38,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 610.28442 ± 56.547
2025-05-07 13:04:38,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [591.14276, 594.5059, 466.76022, 635.51526, 649.16113, 606.4642, 645.362, 590.18524, 690.5188, 633.229]
2025-05-07 13:04:38,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:04:38,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 35 minutes, 35 seconds)
2025-05-07 13:07:33,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:07:53,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 598.36694 ± 8.315
2025-05-07 13:07:53,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [613.78, 607.4285, 593.407, 586.6083, 591.8917, 592.0557, 607.8566, 598.41394, 599.7836, 592.4443]
2025-05-07 13:07:53,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:07:53,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 35 minutes, 10 seconds)
2025-05-07 13:10:48,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:11:07,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 674.29669 ± 7.794
2025-05-07 13:11:07,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [670.56635, 670.2953, 658.44293, 684.65356, 677.9091, 670.3481, 673.60254, 681.9902, 684.96356, 670.19543]
2025-05-07 13:11:07,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:11:07,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (674.30) for latency ExtremeSparseL4U32
2025-05-07 13:11:07,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:11:07,483 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:11:07,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 29 minutes, 59 seconds)
2025-05-07 13:14:03,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:14:22,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 697.99683 ± 14.507
2025-05-07 13:14:22,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [695.76276, 692.17194, 689.26117, 723.2855, 710.02893, 709.5755, 694.09296, 666.5947, 693.9214, 705.27344]
2025-05-07 13:14:22,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:14:22,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (698.00) for latency ExtremeSparseL4U32
2025-05-07 13:14:22,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:14:22,651 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:14:22,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 30 minutes, 22 seconds)
2025-05-07 13:17:17,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:17:36,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 654.26312 ± 20.417
2025-05-07 13:17:36,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [694.0173, 647.0829, 620.80865, 671.6047, 663.05914, 647.4849, 635.208, 668.088, 660.894, 634.3836]
2025-05-07 13:17:36,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:17:36,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 28 minutes, 28 seconds)
2025-05-07 13:20:31,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:20:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 658.95593 ± 14.722
2025-05-07 13:20:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [642.99274, 662.3554, 679.5401, 663.9758, 654.4188, 687.71716, 642.32574, 662.97363, 648.583, 644.67694]
2025-05-07 13:20:50,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:20:50,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 24 minutes, 14 seconds)
2025-05-07 13:23:45,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:24:05,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 698.24133 ± 27.454
2025-05-07 13:24:05,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [723.17426, 711.59906, 697.92285, 720.965, 704.9406, 726.37836, 718.60223, 660.2849, 674.95544, 643.59015]
2025-05-07 13:24:05,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:24:05,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (698.24) for latency ExtremeSparseL4U32
2025-05-07 13:24:05,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:24:05,142 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:24:05,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 20 minutes, 49 seconds)
2025-05-07 13:27:00,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:27:19,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 690.76080 ± 10.918
2025-05-07 13:27:19,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [695.6226, 685.0961, 685.3953, 695.875, 688.5345, 702.3283, 698.47253, 666.0841, 705.9737, 684.2261]
2025-05-07 13:27:19,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:27:19,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 17 minutes, 41 seconds)
2025-05-07 13:30:14,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:30:34,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 695.62708 ± 13.981
2025-05-07 13:30:34,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [702.96655, 671.3035, 706.0571, 700.42554, 713.2678, 694.8659, 702.55664, 684.99384, 707.794, 672.03986]
2025-05-07 13:30:34,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:30:34,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 14 minutes, 20 seconds)
2025-05-07 13:33:29,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:33:49,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 737.89679 ± 35.971
2025-05-07 13:33:49,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [705.13916, 782.78625, 712.74524, 762.9728, 764.2521, 770.3008, 733.1705, 764.2504, 722.13104, 661.2198]
2025-05-07 13:33:49,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:33:49,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (737.90) for latency ExtremeSparseL4U32
2025-05-07 13:33:49,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:33:49,011 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:33:49,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 11 minutes, 17 seconds)
2025-05-07 13:36:43,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:37:03,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 740.96619 ± 30.207
2025-05-07 13:37:03,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [744.0301, 738.8863, 665.58545, 781.5667, 759.0014, 728.52954, 751.2183, 751.0943, 767.92065, 721.82874]
2025-05-07 13:37:03,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:37:03,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (740.97) for latency ExtremeSparseL4U32
2025-05-07 13:37:03,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:37:03,005 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:37:03,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 7 minutes, 59 seconds)
2025-05-07 13:39:57,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:40:16,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 733.40198 ± 38.898
2025-05-07 13:40:16,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [701.96594, 778.9581, 752.7549, 632.3078, 729.0856, 753.1705, 753.24475, 756.4608, 731.15594, 744.9162]
2025-05-07 13:40:16,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:40:16,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 4 minutes, 29 seconds)
2025-05-07 13:43:10,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:43:30,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 707.68719 ± 15.808
2025-05-07 13:43:30,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [717.4185, 689.997, 695.2597, 737.2061, 704.8951, 702.2119, 722.46094, 690.55975, 724.47864, 692.3848]
2025-05-07 13:43:30,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:43:30,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 1 minute, 9 seconds)
2025-05-07 13:46:25,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:46:44,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 737.70264 ± 18.034
2025-05-07 13:46:44,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [739.94135, 746.7921, 731.1842, 716.4679, 767.59717, 749.24994, 701.89185, 726.6787, 749.69116, 747.53204]
2025-05-07 13:46:44,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:46:44,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 57 minutes, 51 seconds)
2025-05-07 13:49:39,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:49:59,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 743.47766 ± 21.248
2025-05-07 13:49:59,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [750.6423, 735.4457, 687.0268, 765.33466, 740.68384, 753.79767, 747.05066, 768.12244, 743.67444, 742.9985]
2025-05-07 13:49:59,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:49:59,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (743.48) for latency ExtremeSparseL4U32
2025-05-07 13:49:59,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 13:49:59,005 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:49:59,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 54 minutes, 35 seconds)
2025-05-07 13:52:54,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:53:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 730.67340 ± 13.048
2025-05-07 13:53:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [756.0292, 739.2463, 723.5384, 723.1335, 709.7113, 725.7275, 728.7675, 732.02783, 720.43805, 748.1143]
2025-05-07 13:53:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:53:14,183 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-05-07 13:56:08,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:56:28,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 717.39026 ± 18.779
2025-05-07 13:56:28,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [708.4894, 731.7806, 741.94604, 732.292, 717.8256, 700.5146, 739.14386, 698.5651, 721.77954, 681.5653]
2025-05-07 13:56:28,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:56:28,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 48 minutes, 29 seconds)
2025-05-07 13:59:22,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:59:42,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 726.37061 ± 17.456
2025-05-07 13:59:42,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [725.9215, 714.9769, 728.6873, 727.7994, 732.91205, 686.6104, 734.88617, 761.0065, 727.9278, 722.9781]
2025-05-07 13:59:42,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:59:42,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 45 minutes, 16 seconds)
2025-05-07 14:02:25,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:02:44,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 703.81207 ± 11.553
2025-05-07 14:02:44,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [714.74457, 718.4284, 688.69495, 706.6375, 707.5827, 716.2116, 708.2978, 681.1639, 698.7583, 697.601]
2025-05-07 14:02:44,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:02:44,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 40 minutes, 1 second)
2025-05-07 14:05:39,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:05:58,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 671.26447 ± 30.805
2025-05-07 14:05:58,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [654.95844, 607.8139, 636.23755, 712.3733, 687.1275, 685.77374, 675.65625, 700.6734, 697.1008, 654.92993]
2025-05-07 14:05:58,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:05:58,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 36 minutes, 43 seconds)
2025-05-07 14:08:53,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:09:12,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 736.85571 ± 39.817
2025-05-07 14:09:12,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [772.0554, 659.8337, 744.1986, 662.9741, 750.2868, 776.5755, 738.49963, 745.44794, 772.06757, 746.6175]
2025-05-07 14:09:12,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:09:12,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 33 minutes, 20 seconds)
2025-05-07 14:12:07,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:12:25,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 689.59003 ± 218.205
2025-05-07 14:12:25,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [759.0865, 767.2484, 782.6364, 748.7717, 36.618603, 775.4453, 775.79865, 725.56934, 757.2302, 767.4952]
2025-05-07 14:12:25,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 64.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:12:25,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 29 minutes, 57 seconds)
2025-05-07 14:15:20,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:15:39,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 758.90637 ± 17.361
2025-05-07 14:15:39,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [771.6789, 757.1826, 775.0944, 764.0852, 776.3503, 750.04224, 725.9838, 747.6372, 738.5919, 782.4171]
2025-05-07 14:15:39,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:15:39,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (758.91) for latency ExtremeSparseL4U32
2025-05-07 14:15:39,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:15:39,138 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:15:39,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 26 minutes, 41 seconds)
2025-05-07 14:18:34,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:18:53,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 755.24542 ± 14.693
2025-05-07 14:18:53,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [773.7123, 749.84607, 718.1804, 749.8628, 760.0462, 756.0998, 765.6279, 770.9888, 755.3805, 752.70966]
2025-05-07 14:18:53,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:18:53,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 25 minutes, 18 seconds)
2025-05-07 14:21:48,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:22:07,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 778.31885 ± 15.347
2025-05-07 14:22:07,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [768.4098, 783.9514, 809.9026, 767.37317, 776.9477, 775.19525, 748.4001, 782.4548, 778.67236, 791.8816]
2025-05-07 14:22:07,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:22:07,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (778.32) for latency ExtremeSparseL4U32
2025-05-07 14:22:07,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:22:07,322 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:22:07,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 22 minutes, 5 seconds)
2025-05-07 14:25:05,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:25:26,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 740.32025 ± 18.860
2025-05-07 14:25:26,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [688.6297, 732.0306, 749.1388, 749.4932, 752.21155, 736.1283, 745.71643, 743.67004, 761.12036, 745.06396]
2025-05-07 14:25:26,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:25:26,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 19 minutes, 31 seconds)
2025-05-07 14:28:24,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:28:45,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 767.31781 ± 12.801
2025-05-07 14:28:45,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [765.7152, 777.6135, 767.8422, 783.1517, 788.42316, 762.56036, 762.4787, 765.4833, 759.477, 740.4333]
2025-05-07 14:28:45,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:28:45,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 17 minutes, 9 seconds)
2025-05-07 14:31:43,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:32:03,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 786.94763 ± 28.849
2025-05-07 14:32:03,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [736.0419, 743.44025, 815.75165, 777.4805, 798.44543, 775.63226, 819.6154, 818.0336, 775.73883, 809.29694]
2025-05-07 14:32:03,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:32:03,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (786.95) for latency ExtremeSparseL4U32
2025-05-07 14:32:03,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:32:03,568 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:32:03,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 14 minutes, 32 seconds)
2025-05-07 14:35:02,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:35:22,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 800.87146 ± 21.434
2025-05-07 14:35:22,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [772.3158, 776.57947, 804.3393, 822.64124, 834.3036, 806.00635, 787.9753, 812.2753, 771.4653, 820.8132]
2025-05-07 14:35:22,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:35:22,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (800.87) for latency ExtremeSparseL4U32
2025-05-07 14:35:22,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:35:22,169 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:35:22,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-05-07 14:38:20,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:38:40,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 685.26953 ± 14.791
2025-05-07 14:38:40,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [702.4954, 676.94904, 649.3898, 689.82355, 685.2021, 685.9057, 688.13367, 702.9777, 695.67426, 676.14404]
2025-05-07 14:38:40,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:38:40,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 9 minutes, 9 seconds)
2025-05-07 14:41:39,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:41:59,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 722.69434 ± 12.097
2025-05-07 14:41:59,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [693.94464, 711.56494, 725.1829, 733.45154, 719.50995, 723.98926, 737.5077, 719.2307, 730.8736, 731.6879]
2025-05-07 14:41:59,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:41:59,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 5 minutes, 49 seconds)
2025-05-07 14:44:58,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:45:18,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 727.18079 ± 12.634
2025-05-07 14:45:18,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [702.66284, 726.70514, 713.175, 743.6551, 715.6507, 740.85474, 729.08997, 730.9159, 728.1795, 740.91895]
2025-05-07 14:45:18,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:45:18,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2025-05-07 14:48:16,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:48:36,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 798.89557 ± 42.550
2025-05-07 14:48:36,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [823.12634, 818.0945, 828.19226, 859.7558, 759.8934, 790.1211, 843.625, 769.81885, 786.86346, 709.4654]
2025-05-07 14:48:36,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:48:36,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 59 minutes, 10 seconds)
2025-05-07 14:51:35,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:51:55,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 730.01117 ± 20.638
2025-05-07 14:51:55,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [776.60205, 720.65704, 710.2255, 713.62946, 735.5024, 725.1975, 712.0239, 716.3904, 732.04156, 757.8419]
2025-05-07 14:51:55,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:51:55,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 55 minutes, 51 seconds)
2025-05-07 14:54:54,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:55:13,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 815.89844 ± 14.303
2025-05-07 14:55:13,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [835.11884, 818.4991, 795.52716, 807.88116, 811.79645, 826.54553, 808.33527, 803.4321, 807.9473, 843.9016]
2025-05-07 14:55:13,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:55:13,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (815.90) for latency ExtremeSparseL4U32
2025-05-07 14:55:13,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 14:55:13,256 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 14:55:13,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 52 minutes, 28 seconds)
2025-05-07 14:58:12,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:58:32,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 770.05487 ± 7.633
2025-05-07 14:58:32,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [776.8678, 765.7814, 779.53613, 760.90045, 780.05945, 768.5654, 768.16315, 758.8713, 763.4221, 778.38196]
2025-05-07 14:58:32,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:58:32,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 49 minutes, 11 seconds)
2025-05-07 15:01:32,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:01:51,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 768.19617 ± 17.327
2025-05-07 15:01:51,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [765.0446, 778.83344, 769.1952, 781.59015, 761.6233, 754.9923, 802.9484, 757.4694, 775.3431, 734.92163]
2025-05-07 15:01:51,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:01:51,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 45 minutes, 58 seconds)
2025-05-07 15:04:51,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:05:10,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 693.42816 ± 201.450
2025-05-07 15:05:10,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [767.55304, 767.9157, 741.90027, 799.0382, 751.46826, 758.1504, 90.85301, 745.20795, 762.59906, 749.5957]
2025-05-07 15:05:10,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:05:10,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 42 minutes, 43 seconds)
2025-05-07 15:08:09,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:08:29,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 753.32233 ± 21.873
2025-05-07 15:08:29,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [775.5254, 747.0099, 718.38135, 767.1569, 774.6418, 766.1335, 753.01135, 711.4967, 746.0834, 773.7829]
2025-05-07 15:08:29,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:08:29,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 39 minutes, 24 seconds)
2025-05-07 15:11:27,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:11:47,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 788.20660 ± 16.985
2025-05-07 15:11:47,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [795.02, 790.366, 765.8019, 800.7389, 807.4748, 808.0227, 778.1229, 802.72437, 757.1421, 776.65234]
2025-05-07 15:11:47,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:11:47,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 36 minutes, 8 seconds)
2025-05-07 15:14:46,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:15:06,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 771.89642 ± 19.710
2025-05-07 15:15:06,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [753.45276, 730.09906, 790.94794, 792.23883, 792.00195, 770.67804, 759.7441, 777.7157, 790.16907, 761.91766]
2025-05-07 15:15:06,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:15:06,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 32 minutes, 46 seconds)
2025-05-07 15:18:04,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:18:24,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 798.49622 ± 23.907
2025-05-07 15:18:24,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [803.43787, 784.0034, 764.9028, 812.3906, 833.72394, 817.788, 817.38855, 802.5144, 796.9057, 751.9066]
2025-05-07 15:18:24,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:18:24,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 29 minutes, 23 seconds)
2025-05-07 15:21:31,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:21:51,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 819.12012 ± 10.474
2025-05-07 15:21:51,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [795.5792, 819.78033, 818.4152, 823.88226, 838.83203, 823.04584, 822.59485, 810.8081, 823.2562, 815.0067]
2025-05-07 15:21:51,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:21:51,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (819.12) for latency ExtremeSparseL4U32
2025-05-07 15:21:51,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 15:21:51,201 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 15:21:51,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 26 minutes, 42 seconds)
2025-05-07 15:24:49,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:25:09,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 766.35419 ± 29.101
2025-05-07 15:25:09,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [746.31775, 745.04706, 773.22174, 783.6722, 793.5037, 809.7669, 798.3505, 758.99786, 711.1891, 743.47516]
2025-05-07 15:25:09,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:25:09,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 23 minutes, 22 seconds)
2025-05-07 15:28:09,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:28:31,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 776.17792 ± 9.420
2025-05-07 15:28:31,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [772.4307, 792.6892, 765.8619, 765.78564, 778.12585, 762.34766, 774.96716, 778.9864, 782.49274, 788.0923]
2025-05-07 15:28:31,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:28:31,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 20 minutes, 17 seconds)
2025-05-07 15:31:34,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:31:54,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 744.04144 ± 23.579
2025-05-07 15:31:54,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [717.1937, 738.9106, 775.5847, 742.1441, 748.9002, 750.206, 724.7261, 762.12244, 779.2007, 701.42615]
2025-05-07 15:31:54,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:31:54,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 17 minutes, 16 seconds)
2025-05-07 15:35:10,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:35:35,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 738.76654 ± 17.030
2025-05-07 15:35:35,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [734.8931, 750.3855, 746.2432, 702.95087, 762.13464, 726.52234, 753.8947, 751.8237, 737.8121, 721.00543]
2025-05-07 15:35:35,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:35:35,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 15 minutes, 34 seconds)
2025-05-07 15:38:45,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:39:04,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 778.18616 ± 53.389
2025-05-07 15:39:04,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [799.48804, 739.25775, 776.95483, 782.0707, 798.3861, 826.1635, 637.59705, 823.83417, 822.381, 775.7292]
2025-05-07 15:39:04,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:39:04,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 12 minutes, 20 seconds)
2025-05-07 15:41:56,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:42:15,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 762.86292 ± 15.367
2025-05-07 15:42:15,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [741.7879, 768.5093, 752.5969, 763.12146, 785.7938, 757.89496, 784.7519, 772.6803, 737.32367, 764.1691]
2025-05-07 15:42:15,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:42:15,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 8 minutes, 22 seconds)
2025-05-07 15:45:08,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:45:28,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 786.17810 ± 26.381
2025-05-07 15:45:28,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [795.2007, 810.2583, 778.8017, 799.34064, 726.69855, 798.8829, 816.2902, 750.92554, 800.9769, 784.40515]
2025-05-07 15:45:28,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:45:28,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 4 minutes, 22 seconds)
2025-05-07 15:48:21,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:48:40,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 777.12732 ± 23.865
2025-05-07 15:48:40,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [815.7881, 727.53253, 775.554, 778.27625, 763.9156, 807.3056, 786.3601, 789.20526, 769.0695, 758.26666]
2025-05-07 15:48:40,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:48:40,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 23 seconds)
2025-05-07 15:51:38,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:51:57,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 802.45392 ± 17.445
2025-05-07 15:51:57,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [796.0212, 822.8414, 789.2281, 781.07495, 805.8732, 804.04004, 829.2209, 810.3128, 770.8079, 815.11847]
2025-05-07 15:51:57,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:51:57,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 84/100 (estimated time remaining: 55 minutes, 38 seconds)
2025-05-07 15:54:55,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:55:14,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 799.80627 ± 5.220
2025-05-07 15:55:14,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [798.3726, 793.827, 798.2649, 803.85095, 790.5049, 799.73175, 796.525, 805.5841, 808.5127, 802.88934]
2025-05-07 15:55:14,042 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:55:14,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 85/100 (estimated time remaining: 51 minutes, 41 seconds)
2025-05-07 15:58:08,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:58:27,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 792.76752 ± 13.644
2025-05-07 15:58:27,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [800.6911, 799.80963, 772.14355, 803.3532, 773.0747, 801.2465, 782.22296, 787.7855, 789.9203, 817.4276]
2025-05-07 15:58:27,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:58:27,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 86/100 (estimated time remaining: 48 minutes, 34 seconds)
2025-05-07 16:01:22,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:01:42,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 797.69507 ± 23.577
2025-05-07 16:01:42,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [790.2103, 784.7331, 805.29785, 775.7004, 836.3942, 758.79584, 822.29645, 804.9715, 775.6183, 822.9331]
2025-05-07 16:01:42,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:01:42,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 87/100 (estimated time remaining: 45 minutes, 27 seconds)
2025-05-07 16:04:39,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:04:59,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 814.22760 ± 8.920
2025-05-07 16:04:59,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [819.1348, 823.4385, 828.4153, 813.4812, 807.7102, 811.50543, 810.94617, 819.74896, 813.4966, 794.39886]
2025-05-07 16:04:59,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:04:59,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 88/100 (estimated time remaining: 42 minutes, 23 seconds)
2025-05-07 16:07:58,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:08:16,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 728.44977 ± 130.041
2025-05-07 16:08:16,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [741.90533, 790.08093, 349.4195, 843.9728, 785.1376, 738.7063, 749.73224, 753.0269, 787.26587, 745.2503]
2025-05-07 16:08:16,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 535.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:08:16,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 89/100 (estimated time remaining: 39 minutes, 9 seconds)
2025-05-07 16:11:14,588 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:11:34,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 827.69092 ± 8.441
2025-05-07 16:11:34,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [829.7432, 828.8618, 817.92426, 836.2283, 847.11273, 819.5673, 824.43146, 823.4174, 819.8089, 829.8144]
2025-05-07 16:11:34,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:11:34,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (827.69) for latency ExtremeSparseL4U32
2025-05-07 16:11:34,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 16:11:34,445 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:11:34,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 90/100 (estimated time remaining: 35 minutes, 56 seconds)
2025-05-07 16:14:32,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:14:51,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 812.42517 ± 27.572
2025-05-07 16:14:51,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [821.7361, 775.0597, 832.0927, 786.24664, 875.8099, 812.1166, 799.1312, 830.09314, 795.77985, 796.1857]
2025-05-07 16:14:51,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:14:51,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 91/100 (estimated time remaining: 32 minutes, 49 seconds)
2025-05-07 16:17:49,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:18:09,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 801.04431 ± 21.581
2025-05-07 16:18:09,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [851.2438, 773.7874, 781.1304, 795.20795, 788.13416, 793.12256, 807.0202, 826.04016, 792.519, 802.2374]
2025-05-07 16:18:09,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:18:09,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 92/100 (estimated time remaining: 29 minutes, 36 seconds)
2025-05-07 16:21:05,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:21:25,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 794.12933 ± 6.788
2025-05-07 16:21:25,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [806.11816, 795.58405, 799.7287, 782.9066, 801.7806, 787.7247, 788.9579, 796.1536, 793.37286, 788.96643]
2025-05-07 16:21:25,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:21:25,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 18 seconds)
2025-05-07 16:24:22,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:24:42,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 807.49084 ± 12.722
2025-05-07 16:24:42,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [798.34296, 804.4315, 836.86554, 809.41174, 793.12354, 797.92474, 794.8595, 820.77405, 812.4272, 806.7474]
2025-05-07 16:24:42,361 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:24:42,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes)
2025-05-07 16:27:40,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:28:00,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 779.78308 ± 126.666
2025-05-07 16:28:00,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [826.11664, 822.6255, 406.36847, 831.93726, 835.67303, 820.5745, 838.46893, 826.84174, 835.94196, 753.2834]
2025-05-07 16:28:00,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:28:00,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 42 seconds)
2025-05-07 16:30:57,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:31:17,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 770.22314 ± 28.002
2025-05-07 16:31:17,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [766.2056, 769.1636, 771.5979, 790.2615, 758.19934, 772.54724, 819.6813, 798.555, 712.8766, 743.14343]
2025-05-07 16:31:17,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:31:17,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 25 seconds)
2025-05-07 16:34:05,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:34:25,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 830.26465 ± 12.767
2025-05-07 16:34:25,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [824.12683, 838.27045, 858.7094, 824.0041, 813.07135, 829.0039, 833.89233, 812.23865, 836.36725, 832.9624]
2025-05-07 16:34:25,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:34:25,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (830.26) for latency ExtremeSparseL4U32
2025-05-07 16:34:25,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 16:34:25,746 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:34:25,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 97/100 (estimated time remaining: 13 minutes, 1 second)
2025-05-07 16:37:23,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:37:44,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 784.88806 ± 30.693
2025-05-07 16:37:44,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [746.57526, 735.3798, 796.0702, 797.758, 742.01465, 804.355, 794.58936, 834.62036, 794.1997, 803.31836]
2025-05-07 16:37:44,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:37:44,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 47 seconds)
2025-05-07 16:40:49,647 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:41:10,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 827.50800 ± 16.470
2025-05-07 16:41:10,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [825.073, 832.41473, 830.4554, 825.4054, 809.8725, 841.25464, 789.1419, 849.52515, 841.87384, 830.06354]
2025-05-07 16:41:10,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:41:10,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 35 seconds)
2025-05-07 16:44:24,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:44:44,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 833.91016 ± 33.552
2025-05-07 16:44:44,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [827.71454, 835.6736, 860.2831, 830.01025, 821.78033, 788.5435, 779.798, 860.158, 832.8075, 902.333]
2025-05-07 16:44:44,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:44:44,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1124 [INFO]: New best (833.91) for latency ExtremeSparseL4U32
2025-05-07 16:44:44,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1127 [INFO]: saving network
2025-05-07 16:44:44,353 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-ant/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 16:44:44,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1097 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 20 seconds)
2025-05-07 16:47:51,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:48:11,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1119 [DEBUG]: Total Reward: 786.86945 ± 16.100
2025-05-07 16:48:11,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1120 [DEBUG]: All rewards: [811.03296, 771.565, 778.3517, 803.91736, 794.88104, 797.55756, 753.5861, 779.24677, 794.5224, 784.0333]
2025-05-07 16:48:11,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:48:11,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1149 [DEBUG]: Training session finished
