2025-05-10 06:34:39,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 06:34:39,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 06:34:39,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x735655440f70>}
2025-05-10 06:34:39,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1111 [DEBUG]: using device: cpu
2025-05-10 06:34:39,405 baseline-sac-noisy-ant:77 [WARNING]: args.memorize_actions != args.horizon: 16 != 24
2025-05-10 06:34:39,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1133 [INFO]: Creating new trainer
2025-05-10 06:34:39,416 baseline-sac-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=155, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-10 06:34:39,416 baseline-sac-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=163, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 06:34:39,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1194 [DEBUG]: Starting training session...
2025-05-10 06:34:39,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 1/100
2025-05-10 06:37:25,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:37:26,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -95.27549 ± 63.857
2025-05-10 06:37:26,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-73.90695, 8.847701, -186.19585, -119.65895, -180.85634, -88.54355, -148.27591, 1.7339208, -63.9156, -101.98333]
2025-05-10 06:37:26,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [57.0, 22.0, 156.0, 95.0, 124.0, 65.0, 122.0, 22.0, 52.0, 80.0]
2025-05-10 06:37:26,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-95.28) for latency MM1Queue_a033_s075
2025-05-10 06:37:26,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 06:37:26,590 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 06:37:26,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 35 minutes, 32 seconds)
2025-05-10 06:40:23,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:40:24,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -26.59741 ± 48.036
2025-05-10 06:40:24,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [14.3068495, 15.049645, -11.349557, -86.18552, 11.820575, -37.032574, -131.98433, -55.739197, 5.538653, 9.601292]
2025-05-10 06:40:24,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [24.0, 26.0, 40.0, 135.0, 34.0, 42.0, 127.0, 73.0, 38.0, 58.0]
2025-05-10 06:40:24,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-26.60) for latency MM1Queue_a033_s075
2025-05-10 06:40:24,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 06:40:24,211 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 06:40:24,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 41 minutes, 25 seconds)
2025-05-10 06:43:18,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:43:20,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -56.21397 ± 73.424
2025-05-10 06:43:20,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-219.23688, -23.09548, -76.706535, -134.45882, 7.1532645, 12.592726, -48.689236, 7.942146, 13.037275, -100.6782]
2025-05-10 06:43:20,516 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [244.0, 144.0, 152.0, 186.0, 41.0, 42.0, 77.0, 52.0, 54.0, 178.0]
2025-05-10 06:43:20,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 40 minutes, 42 seconds)
2025-05-10 06:46:03,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:46:05,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -40.42281 ± 58.847
2025-05-10 06:46:05,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-17.781822, -192.29398, 14.191454, -3.0323014, -36.05571, 7.0574117, -96.69635, -45.049583, -8.941634, -25.625546]
2025-05-10 06:46:05,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [82.0, 208.0, 54.0, 66.0, 239.0, 67.0, 248.0, 143.0, 96.0, 82.0]
2025-05-10 06:46:05,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 34 minutes, 25 seconds)
2025-05-10 06:48:59,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:49:03,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -84.91589 ± 142.848
2025-05-10 06:49:03,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-66.17953, -128.33089, 3.2165291, 4.3621264, -14.550732, -36.210552, -40.383045, -498.5761, -57.73724, -14.7694235]
2025-05-10 06:49:03,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [195.0, 138.0, 103.0, 64.0, 56.0, 145.0, 98.0, 1000.0, 134.0, 139.0]
2025-05-10 06:49:03,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 33 minutes, 38 seconds)
2025-05-10 06:51:57,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:52:00,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -28.75638 ± 60.299
2025-05-10 06:52:00,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [12.276436, -9.104048, -90.72375, 7.366305, 44.648922, 2.3103695, -115.807915, 9.933795, -5.022294, -143.44164]
2025-05-10 06:52:00,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [83.0, 46.0, 119.0, 47.0, 114.0, 46.0, 151.0, 37.0, 49.0, 399.0]
2025-05-10 06:52:00,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 33 minutes, 40 seconds)
2025-05-10 06:54:54,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:54:56,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -12.82960 ± 55.204
2025-05-10 06:54:56,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-170.43797, 3.6138742, 13.193396, 24.439272, 14.650699, -38.170235, -9.055822, 0.9287129, 14.887857, 17.654211]
2025-05-10 06:54:56,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [323.0, 57.0, 26.0, 89.0, 81.0, 80.0, 82.0, 29.0, 27.0, 22.0]
2025-05-10 06:54:56,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-12.83) for latency MM1Queue_a033_s075
2025-05-10 06:54:56,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 06:54:56,276 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 06:54:56,305 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 30 minutes, 20 seconds)
2025-05-10 06:58:03,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:58:11,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -156.06107 ± 238.319
2025-05-10 06:58:11,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [15.797066, -20.69248, -530.58966, -537.6858, -6.6312084, 8.774925, -489.50238, 14.9079895, 0.7077867, -15.696865]
2025-05-10 06:58:11,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [54.0, 72.0, 1000.0, 1000.0, 48.0, 87.0, 1000.0, 31.0, 54.0, 59.0]
2025-05-10 06:58:11,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 33 minutes, 5 seconds)
2025-05-10 07:00:53,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:00:54,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -24.14393 ± 31.358
2025-05-10 07:00:54,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-10.804566, 14.740574, -42.941463, -70.44268, -32.76183, -2.7951574, -1.1446663, -84.254074, 6.324477, -17.359938]
2025-05-10 07:00:54,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 34.0, 93.0, 113.0, 91.0, 50.0, 37.0, 118.0, 38.0, 60.0]
2025-05-10 07:00:54,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 29 minutes, 44 seconds)
2025-05-10 07:03:51,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:03:55,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -51.30516 ± 109.272
2025-05-10 07:03:55,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-370.5279, -42.76045, 0.028406125, 6.8065333, 22.760668, 1.4734303, -0.9811265, -27.297981, -47.10978, -55.44346]
2025-05-10 07:03:55,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 95.0, 43.0, 38.0, 31.0, 78.0, 34.0, 90.0, 118.0, 137.0]
2025-05-10 07:03:55,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 27 minutes, 29 seconds)
2025-05-10 07:06:46,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:06:50,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -48.53746 ± 139.018
2025-05-10 07:06:50,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [14.914038, -56.860977, 9.570171, -12.247417, 4.695691, 35.589676, -5.8626122, -458.8536, 15.48587, -31.805405]
2025-05-10 07:06:50,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [36.0, 71.0, 35.0, 43.0, 64.0, 51.0, 44.0, 1000.0, 31.0, 66.0]
2025-05-10 07:06:50,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 24 minutes, 1 second)
2025-05-10 07:09:37,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:09:38,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -5.19946 ± 21.606
2025-05-10 07:09:38,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-18.011852, 17.869602, -34.47346, -0.9192086, 16.241142, -4.377209, -7.7576447, -1.5037425, 26.375067, -45.437298]
2025-05-10 07:09:38,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 23.0, 53.0, 35.0, 31.0, 111.0, 66.0, 36.0, 51.0, 75.0]
2025-05-10 07:09:38,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (-5.20) for latency MM1Queue_a033_s075
2025-05-10 07:09:38,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 07:09:38,223 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 07:09:38,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 18 minutes, 41 seconds)
2025-05-10 07:12:32,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:12:33,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -16.95984 ± 37.438
2025-05-10 07:12:33,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-93.438644, 8.627524, 3.8991885, -24.02359, 32.089085, -64.87536, 13.337198, -42.58727, 10.118926, -12.745433]
2025-05-10 07:12:33,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [193.0, 42.0, 68.0, 49.0, 50.0, 74.0, 37.0, 78.0, 25.0, 47.0]
2025-05-10 07:12:33,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 10 minutes, 11 seconds)
2025-05-10 07:15:40,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:15:41,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 0.32683 ± 20.882
2025-05-10 07:15:41,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-44.15738, 7.798229, -8.099064, 12.702924, 6.2591524, 5.1839004, 21.97322, 11.133632, -31.27627, 21.75]
2025-05-10 07:15:41,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 70.0, 108.0, 23.0, 40.0, 62.0, 25.0, 47.0, 58.0, 29.0]
2025-05-10 07:15:41,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (0.33) for latency MM1Queue_a033_s075
2025-05-10 07:15:41,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 07:15:41,364 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 07:15:41,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 14 minutes, 7 seconds)
2025-05-10 07:18:19,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:18:20,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -15.67969 ± 36.033
2025-05-10 07:18:20,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [26.824106, -102.95173, -30.902111, -17.493464, -16.764929, -38.24594, 0.5740693, -17.619982, 22.770746, 17.012325]
2025-05-10 07:18:20,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 113.0, 61.0, 68.0, 50.0, 63.0, 45.0, 47.0, 34.0, 36.0]
2025-05-10 07:18:20,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 5 minutes, 5 seconds)
2025-05-10 07:21:13,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:21:17,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -64.28810 ± 104.575
2025-05-10 07:21:17,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-22.75482, -80.763695, -121.545456, -27.2938, 14.173261, 5.922401, 45.076298, 3.258988, -128.15755, -330.79657]
2025-05-10 07:21:17,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [86.0, 98.0, 143.0, 58.0, 60.0, 42.0, 87.0, 50.0, 159.0, 1000.0]
2025-05-10 07:21:17,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 2 minutes, 49 seconds)
2025-05-10 07:24:03,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:24:05,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -38.83495 ± 36.090
2025-05-10 07:24:05,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-10.766392, -28.935328, -77.98195, -60.228504, -42.645485, -78.26171, -1.8611368, -3.0042884, 12.003345, -96.66808]
2025-05-10 07:24:05,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 96.0, 100.0, 115.0, 62.0, 174.0, 64.0, 67.0, 52.0, 161.0]
2025-05-10 07:24:05,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours)
2025-05-10 07:26:59,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:27:00,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -20.53912 ± 36.507
2025-05-10 07:27:00,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [13.597722, 17.592432, -24.674927, -15.266344, -11.462246, -35.9199, -116.520485, -33.248425, 7.5219884, -7.011062]
2025-05-10 07:27:00,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [27.0, 23.0, 61.0, 67.0, 70.0, 60.0, 129.0, 59.0, 47.0, 49.0]
2025-05-10 07:27:00,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 56 minutes, 50 seconds)
2025-05-10 07:29:50,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:29:50,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 11.44776 ± 7.272
2025-05-10 07:29:50,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [17.539762, 14.689931, 12.218036, 13.086401, 15.236946, 15.714876, 20.910433, 0.4153976, 8.162889, -3.4971156]
2025-05-10 07:29:50,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [45.0, 24.0, 60.0, 60.0, 33.0, 30.0, 23.0, 40.0, 44.0, 58.0]
2025-05-10 07:29:50,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (11.45) for latency MM1Queue_a033_s075
2025-05-10 07:29:50,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 07:29:50,850 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 07:29:50,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 49 minutes, 21 seconds)
2025-05-10 07:32:40,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:32:48,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -123.45625 ± 247.687
2025-05-10 07:32:48,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-13.184671, -11.632426, -123.069725, 162.064, -16.083445, -754.8651, -54.336884, 1.6773864, -38.236443, -386.8952]
2025-05-10 07:32:48,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 56.0, 108.0, 1000.0, 74.0, 1000.0, 66.0, 54.0, 110.0, 1000.0]
2025-05-10 07:32:48,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 51 minutes, 28 seconds)
2025-05-10 07:35:42,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:35:43,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.50478 ± 25.280
2025-05-10 07:35:43,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-6.8124337, 7.275365, 6.1936717, 27.164383, 25.728378, 2.3569536, -40.7963, -40.62061, -40.149323, -25.387888]
2025-05-10 07:35:43,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 22.0, 23.0, 44.0, 40.0, 31.0, 83.0, 64.0, 82.0, 75.0]
2025-05-10 07:35:43,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 48 minutes)
2025-05-10 07:38:37,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:38:38,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -19.46472 ± 37.241
2025-05-10 07:38:38,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-22.324986, -27.301456, 15.932985, -3.2284837, 8.173109, 2.0303957, -49.01908, -116.38938, 6.6810837, -9.201368]
2025-05-10 07:38:38,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [79.0, 43.0, 77.0, 55.0, 83.0, 49.0, 68.0, 196.0, 49.0, 49.0]
2025-05-10 07:38:38,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 47 minutes, 1 second)
2025-05-10 07:41:25,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:41:27,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -15.78996 ± 52.437
2025-05-10 07:41:27,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-147.42375, 11.866078, -1.2093449, 16.020742, 7.4528174, 29.032696, -74.72289, -28.132147, 23.44985, 5.766376]
2025-05-10 07:41:27,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [159.0, 61.0, 52.0, 24.0, 36.0, 50.0, 100.0, 114.0, 67.0, 38.0]
2025-05-10 07:41:27,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 42 minutes, 29 seconds)
2025-05-10 07:44:22,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:44:23,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -19.45147 ± 34.358
2025-05-10 07:44:23,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [10.5183525, -19.087034, 29.27475, -89.44783, -1.2070147, -11.434322, -11.818371, -74.24341, -6.2804575, -20.789406]
2025-05-10 07:44:23,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [127.0, 83.0, 78.0, 118.0, 56.0, 71.0, 92.0, 87.0, 64.0, 61.0]
2025-05-10 07:44:23,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 41 minutes, 5 seconds)
2025-05-10 07:47:08,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:47:12,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -38.34615 ± 83.239
2025-05-10 07:47:12,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-6.6364517, -273.6905, -58.208908, -60.00361, 3.5502677, 4.5459156, -36.864326, 18.035362, 13.986171, 11.8245735]
2025-05-10 07:47:12,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 1000.0, 99.0, 91.0, 47.0, 50.0, 156.0, 28.0, 22.0, 37.0]
2025-05-10 07:47:12,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 35 minutes, 56 seconds)
2025-05-10 07:50:03,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:50:04,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -21.37583 ± 28.031
2025-05-10 07:50:04,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [9.958716, -66.298836, 8.52376, -13.355679, -17.924936, 11.876383, -51.33254, -6.302697, -61.658813, -27.243639]
2025-05-10 07:50:04,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [33.0, 107.0, 52.0, 58.0, 82.0, 51.0, 103.0, 61.0, 98.0, 57.0]
2025-05-10 07:50:04,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 32 minutes, 26 seconds)
2025-05-10 07:52:54,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:52:56,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -40.94480 ± 61.247
2025-05-10 07:52:56,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-28.804123, -189.06538, -47.717, 8.098613, -101.02154, 29.880198, 10.614934, -50.453693, -40.019547, -0.96045583]
2025-05-10 07:52:56,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [137.0, 230.0, 89.0, 95.0, 125.0, 55.0, 72.0, 78.0, 146.0, 36.0]
2025-05-10 07:52:56,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 28 minutes, 36 seconds)
2025-05-10 07:55:47,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:55:48,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 4.47645 ± 23.312
2025-05-10 07:55:48,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [24.410765, -51.314476, -1.091672, -13.614797, 38.246338, 2.883409, 2.3163376, 5.490299, 21.13725, 16.30107]
2025-05-10 07:55:48,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [44.0, 90.0, 72.0, 72.0, 78.0, 55.0, 73.0, 41.0, 39.0, 27.0]
2025-05-10 07:55:48,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 26 minutes, 42 seconds)
2025-05-10 07:58:55,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:58:57,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 2.02164 ± 23.531
2025-05-10 07:58:57,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [25.74821, 29.899559, -32.87686, 9.5958185, 30.432447, 26.010647, -22.719784, -13.241377, -19.583996, -13.048289]
2025-05-10 07:58:57,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [55.0, 58.0, 72.0, 127.0, 34.0, 47.0, 88.0, 87.0, 64.0, 86.0]
2025-05-10 07:58:57,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 26 minutes, 44 seconds)
2025-05-10 08:01:33,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:01:34,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 10.84316 ± 15.338
2025-05-10 08:01:34,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [5.604916, -20.911299, 8.491123, 14.296118, 0.2524442, 43.100903, 9.935376, 12.766289, 12.219544, 22.676231]
2025-05-10 08:01:34,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [70.0, 75.0, 48.0, 22.0, 77.0, 84.0, 80.0, 69.0, 51.0, 52.0]
2025-05-10 08:01:34,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 21 minutes, 18 seconds)
2025-05-10 08:04:25,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:04:28,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.31754 ± 31.756
2025-05-10 08:04:28,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [29.76613, -26.36571, -45.1864, 22.684818, 14.583846, -32.72986, -69.946625, -3.3442447, 7.4981775, 19.864428]
2025-05-10 08:04:28,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 85.0, 108.0, 37.0, 23.0, 83.0, 105.0, 87.0, 82.0, 24.0]
2025-05-10 08:04:28,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 18 minutes, 49 seconds)
2025-05-10 08:07:29,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:07:30,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -23.46439 ± 43.796
2025-05-10 08:07:30,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [19.235397, -44.955807, -33.694324, -134.83018, -39.4887, 15.783294, 9.556969, 14.980852, -18.754362, -22.47707]
2025-05-10 08:07:30,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [60.0, 81.0, 128.0, 160.0, 113.0, 67.0, 39.0, 42.0, 74.0, 86.0]
2025-05-10 08:07:30,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 18 minutes, 16 seconds)
2025-05-10 08:10:19,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:10:20,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -3.52813 ± 56.181
2025-05-10 08:10:20,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [25.759176, 22.078836, 7.3471045, 12.17439, 13.906572, 11.578282, -169.34023, -8.660098, 22.485756, 27.388874]
2025-05-10 08:10:20,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [51.0, 34.0, 65.0, 47.0, 109.0, 75.0, 169.0, 90.0, 42.0, 57.0]
2025-05-10 08:10:20,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 14 minutes, 44 seconds)
2025-05-10 08:13:02,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:13:03,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 0.14105 ± 24.272
2025-05-10 08:13:03,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [19.817698, 37.55962, -27.161926, 24.468214, 15.945706, -7.4967256, -13.575841, -29.132685, -34.27922, 15.265657]
2025-05-10 08:13:03,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 64.0, 63.0, 51.0, 42.0, 58.0, 65.0, 90.0, 127.0, 56.0]
2025-05-10 08:13:03,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 6 minutes, 7 seconds)
2025-05-10 08:15:54,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:16:02,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -74.15559 ± 111.782
2025-05-10 08:16:02,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [39.762817, -146.26294, -5.2549124, -144.21252, 36.484158, -116.237595, -19.39348, -34.43153, -345.98206, -6.0277934]
2025-05-10 08:16:02,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [61.0, 1000.0, 1000.0, 174.0, 80.0, 133.0, 118.0, 70.0, 1000.0, 76.0]
2025-05-10 08:16:02,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 7 minutes, 54 seconds)
2025-05-10 08:19:00,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:19:04,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -14.97527 ± 43.542
2025-05-10 08:19:04,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [2.5868049, 24.942852, 24.28913, -31.851473, -8.962982, 20.768114, -102.27929, -88.091805, -5.492722, 14.338706]
2025-05-10 08:19:04,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 34.0, 73.0, 75.0, 83.0, 35.0, 118.0, 1000.0, 53.0, 39.0]
2025-05-10 08:19:04,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 6 minutes, 42 seconds)
2025-05-10 08:22:08,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:22:14,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -33.49448 ± 50.301
2025-05-10 08:22:14,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-40.50984, 24.231443, -44.743294, -24.652254, -2.5391173, 24.722277, -121.49349, -126.56736, -9.272566, -14.12064]
2025-05-10 08:22:14,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [97.0, 53.0, 118.0, 95.0, 49.0, 62.0, 117.0, 1000.0, 1000.0, 62.0]
2025-05-10 08:22:14,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 5 minutes, 29 seconds)
2025-05-10 08:25:03,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:25:05,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -26.49911 ± 63.919
2025-05-10 08:25:05,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-131.40602, 10.4375305, 3.4002683, 1.6446109, -2.9483244, 15.700836, 25.626305, -21.934027, 4.986234, -170.49855]
2025-05-10 08:25:05,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [178.0, 74.0, 53.0, 59.0, 72.0, 26.0, 60.0, 101.0, 80.0, 221.0]
2025-05-10 08:25:05,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 2 minutes, 51 seconds)
2025-05-10 08:28:01,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:28:05,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.55901 ± 54.664
2025-05-10 08:28:05,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [2.791959, 36.870636, -10.66056, 31.566425, -158.17265, 25.740812, 15.904224, 25.67314, -25.004932, -30.299154]
2025-05-10 08:28:05,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [128.0, 79.0, 69.0, 73.0, 1000.0, 99.0, 80.0, 67.0, 74.0, 56.0]
2025-05-10 08:28:05,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 3 minutes, 27 seconds)
2025-05-10 08:30:40,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:30:43,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -0.01399 ± 35.736
2025-05-10 08:30:43,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [19.65522, -91.799545, 7.0600843, 39.537277, 3.8936977, 26.273842, -35.686203, 12.466596, 13.649585, 4.809509]
2025-05-10 08:30:43,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 169.0, 59.0, 57.0, 86.0, 47.0, 1000.0, 22.0, 59.0, 48.0]
2025-05-10 08:30:43,493 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 56 minutes, 14 seconds)
2025-05-10 08:33:35,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:33:40,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -21.89794 ± 51.348
2025-05-10 08:33:40,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-22.213406, -57.107594, 12.651966, -67.62019, 18.191607, 26.594614, 23.53716, 20.147924, -33.272346, -139.88916]
2025-05-10 08:33:40,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [121.0, 123.0, 23.0, 1000.0, 22.0, 61.0, 50.0, 59.0, 71.0, 1000.0]
2025-05-10 08:33:40,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 52 minutes, 26 seconds)
2025-05-10 08:36:32,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:36:36,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.93173 ± 64.381
2025-05-10 08:36:36,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [13.021781, -136.54869, 18.286074, 38.64046, -94.22711, -17.104036, 108.05164, -22.645802, -9.341979, 12.550386]
2025-05-10 08:36:36,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [128.0, 162.0, 71.0, 52.0, 184.0, 68.0, 1000.0, 104.0, 58.0, 72.0]
2025-05-10 08:36:36,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 46 minutes, 41 seconds)
2025-05-10 08:39:29,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:39:30,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -5.93109 ± 42.165
2025-05-10 08:39:30,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [42.856297, -101.55103, -28.58247, 14.086325, 15.888112, -47.483982, 8.030093, -6.364163, 48.307552, -4.4976716]
2025-05-10 08:39:30,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [53.0, 167.0, 102.0, 57.0, 54.0, 99.0, 45.0, 108.0, 68.0, 111.0]
2025-05-10 08:39:30,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 44 minutes, 28 seconds)
2025-05-10 08:42:22,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:42:28,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 10.41522 ± 34.141
2025-05-10 08:42:28,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-32.591927, 8.921987, 1.9845973, 17.09069, 26.860085, 17.208088, 14.904245, -15.559637, 94.6627, -29.328676]
2025-05-10 08:42:28,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [119.0, 48.0, 74.0, 61.0, 90.0, 117.0, 25.0, 109.0, 1000.0, 1000.0]
2025-05-10 08:42:28,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 41 minutes, 7 seconds)
2025-05-10 08:45:31,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:45:32,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 6.42578 ± 31.034
2025-05-10 08:45:32,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [3.787136, 41.7552, 26.986988, 31.043165, 7.9999766, -41.311375, 23.308205, -8.83571, -54.91765, 34.441853]
2025-05-10 08:45:32,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [80.0, 54.0, 47.0, 43.0, 72.0, 100.0, 83.0, 118.0, 126.0, 51.0]
2025-05-10 08:45:32,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 43 minutes, 1 second)
2025-05-10 08:48:11,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:48:13,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 18.23250 ± 11.725
2025-05-10 08:48:13,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [25.233107, -11.379864, 24.538452, 7.459276, 26.338427, 23.292639, 12.178586, 22.582531, 22.569965, 29.511875]
2025-05-10 08:48:13,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 115.0, 64.0, 117.0, 50.0, 56.0, 95.0, 83.0, 45.0, 86.0]
2025-05-10 08:48:13,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (18.23) for latency MM1Queue_a033_s075
2025-05-10 08:48:13,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 08:48:13,181 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 08:48:13,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 37 minutes, 1 second)
2025-05-10 08:51:19,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:51:20,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 5.09479 ± 34.757
2025-05-10 08:51:20,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [40.68949, 19.809591, 6.1927395, 17.263596, 26.592781, 27.170227, -6.996117, -91.38686, 10.391752, 1.2207334]
2025-05-10 08:51:20,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [68.0, 78.0, 86.0, 64.0, 65.0, 79.0, 79.0, 141.0, 54.0, 102.0]
2025-05-10 08:51:20,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 36 minutes, 12 seconds)
2025-05-10 08:54:00,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:54:08,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -24.17722 ± 86.930
2025-05-10 08:54:08,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [32.118114, -245.57773, -54.190826, -64.76525, 31.829618, 69.77832, -26.963257, 19.741028, -56.350094, 52.60785]
2025-05-10 08:54:08,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [52.0, 1000.0, 1000.0, 91.0, 58.0, 134.0, 1000.0, 35.0, 138.0, 76.0]
2025-05-10 08:54:08,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 32 minutes, 9 seconds)
2025-05-10 08:57:09,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 08:57:11,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -7.54365 ± 41.727
2025-05-10 08:57:11,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-114.53268, -13.541997, 20.809397, 14.83581, 33.484352, -46.44797, 11.251909, -3.3340282, 24.565235, -2.5265708]
2025-05-10 08:57:11,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [202.0, 101.0, 45.0, 74.0, 83.0, 107.0, 81.0, 66.0, 49.0, 72.0]
2025-05-10 08:57:11,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 30 minutes, 2 seconds)
2025-05-10 09:00:05,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:00:09,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -19.70350 ± 51.400
2025-05-10 09:00:09,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-77.72149, 8.819327, 21.273378, -99.60538, -70.62012, -69.209984, 50.20885, -4.2026186, 5.8120666, 38.211033]
2025-05-10 09:00:09,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [158.0, 80.0, 58.0, 1000.0, 131.0, 195.0, 79.0, 74.0, 63.0, 61.0]
2025-05-10 09:00:09,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 26 minutes, 5 seconds)
2025-05-10 09:02:53,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:02:57,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 8.65945 ± 40.521
2025-05-10 09:02:57,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [12.695604, 24.253347, 40.239647, -86.54952, 20.18633, -33.201366, 40.71407, -0.54953235, 4.610524, 64.195435]
2025-05-10 09:02:57,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [91.0, 73.0, 102.0, 120.0, 1000.0, 121.0, 54.0, 63.0, 38.0, 92.0]
2025-05-10 09:02:57,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 24 minutes, 27 seconds)
2025-05-10 09:05:55,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:05:57,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 3.26072 ± 46.392
2025-05-10 09:05:57,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [27.133045, -70.32485, 17.787968, 44.435608, 49.996113, -14.404614, -88.702934, -9.946712, 43.057186, 33.576397]
2025-05-10 09:05:57,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [95.0, 159.0, 104.0, 97.0, 59.0, 126.0, 138.0, 119.0, 112.0, 71.0]
2025-05-10 09:05:57,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 20 minutes, 16 seconds)
2025-05-10 09:08:48,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:08:50,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -1.76681 ± 44.893
2025-05-10 09:08:50,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-18.617271, 39.18501, 13.5869055, 16.178972, 23.3148, 7.38821, 3.8344853, -125.51278, 36.248306, -13.274744]
2025-05-10 09:08:50,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [115.0, 50.0, 91.0, 79.0, 23.0, 71.0, 113.0, 219.0, 76.0, 113.0]
2025-05-10 09:08:50,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 18 minutes, 11 seconds)
2025-05-10 09:11:39,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:11:41,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -1.14768 ± 48.865
2025-05-10 09:11:41,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [32.0152, 15.592514, -36.490314, 26.993265, 40.860897, 21.204374, 32.007145, -131.05667, -15.3553705, 2.7521393]
2025-05-10 09:11:41,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [71.0, 107.0, 121.0, 71.0, 80.0, 135.0, 67.0, 139.0, 79.0, 185.0]
2025-05-10 09:11:41,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 13 minutes, 27 seconds)
2025-05-10 09:14:43,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:14:48,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -15.37671 ± 79.693
2025-05-10 09:14:48,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-103.24058, 60.840973, 3.6171396, -8.0212, 20.803867, -199.17117, 66.02584, -49.18565, -11.875414, 66.43911]
2025-05-10 09:14:48,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [181.0, 89.0, 92.0, 203.0, 132.0, 1000.0, 96.0, 114.0, 98.0, 117.0]
2025-05-10 09:14:48,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-05-10 09:17:32,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:17:38,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -44.64404 ± 162.249
2025-05-10 09:17:38,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [18.61668, 26.900957, 28.62785, -27.175608, -516.9018, 3.5114505, 29.117853, 38.206333, -94.110214, 46.766106]
2025-05-10 09:17:38,271 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 58.0, 58.0, 1000.0, 1000.0, 56.0, 82.0, 60.0, 86.0, 39.0]
2025-05-10 09:17:38,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 9 minutes, 9 seconds)
2025-05-10 09:20:28,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:20:37,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -64.16084 ± 80.746
2025-05-10 09:20:37,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-31.701641, -95.811066, 4.30439, -249.89777, -37.70049, -104.98678, 11.530186, 18.371359, -11.995473, -143.72115]
2025-05-10 09:20:37,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [135.0, 204.0, 65.0, 1000.0, 131.0, 1000.0, 150.0, 24.0, 88.0, 1000.0]
2025-05-10 09:20:37,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 6 minutes, 6 seconds)
2025-05-10 09:23:29,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:23:30,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 12.96601 ± 15.815
2025-05-10 09:23:30,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [11.315011, -19.391481, 7.781366, 21.097689, 7.780575, 29.781012, 32.503677, 33.626297, 6.089382, -0.9234818]
2025-05-10 09:23:30,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [76.0, 92.0, 138.0, 35.0, 120.0, 55.0, 70.0, 38.0, 113.0, 103.0]
2025-05-10 09:23:30,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 3 minutes, 14 seconds)
2025-05-10 09:26:23,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:26:29,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -58.04989 ± 112.679
2025-05-10 09:26:29,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-2.4699154, 20.220016, -212.86905, -180.76324, 31.908724, 31.052832, -254.3234, -103.79144, 26.201601, 64.33505]
2025-05-10 09:26:29,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [85.0, 24.0, 229.0, 1000.0, 58.0, 136.0, 1000.0, 117.0, 39.0, 88.0]
2025-05-10 09:26:29,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 1 minute, 20 seconds)
2025-05-10 09:29:22,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:29:28,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 37.43170 ± 56.542
2025-05-10 09:29:28,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [23.24289, -21.325966, 110.63396, 12.280376, -24.460909, 153.5762, -23.832848, 23.995005, 55.506844, 64.701454]
2025-05-10 09:29:28,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [133.0, 82.0, 1000.0, 66.0, 96.0, 1000.0, 80.0, 122.0, 88.0, 113.0]
2025-05-10 09:29:28,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (37.43) for latency MM1Queue_a033_s075
2025-05-10 09:29:28,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 09:29:28,484 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 09:29:28,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 57 minutes, 23 seconds)
2025-05-10 09:32:19,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:32:20,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 1.74574 ± 53.421
2025-05-10 09:32:20,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [45.494858, 20.868908, 20.884672, 28.736073, 2.2217882, 25.594639, 20.805058, -152.43114, 17.790485, -12.507907]
2025-05-10 09:32:20,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [65.0, 25.0, 94.0, 41.0, 63.0, 39.0, 85.0, 146.0, 102.0, 148.0]
2025-05-10 09:32:20,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 54 minutes, 41 seconds)
2025-05-10 09:35:11,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:35:15,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -37.91827 ± 97.763
2025-05-10 09:35:15,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [3.3407917, 36.945915, 9.661374, 34.559433, -105.124115, -294.39252, 6.233052, 47.175224, -72.85315, -44.72869]
2025-05-10 09:35:15,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [87.0, 75.0, 61.0, 102.0, 111.0, 1000.0, 137.0, 73.0, 124.0, 80.0]
2025-05-10 09:35:15,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 51 minutes, 16 seconds)
2025-05-10 09:38:15,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:38:16,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 10.62740 ± 22.149
2025-05-10 09:38:16,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [19.58314, 24.386526, 42.819824, 7.0581255, -33.979164, 18.274761, 23.270823, 11.23624, -25.512743, 19.136446]
2025-05-10 09:38:16,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [26.0, 83.0, 73.0, 96.0, 83.0, 24.0, 116.0, 21.0, 55.0, 22.0]
2025-05-10 09:38:16,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 49 minutes, 10 seconds)
2025-05-10 09:41:10,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:41:16,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 31.21322 ± 35.246
2025-05-10 09:41:16,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [19.574064, 80.468666, -12.05246, 35.032406, 92.37119, -6.2450743, 52.023987, -13.014415, 17.134079, 46.83976]
2025-05-10 09:41:16,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [23.0, 1000.0, 107.0, 126.0, 1000.0, 105.0, 63.0, 69.0, 24.0, 70.0]
2025-05-10 09:41:16,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 46 minutes, 26 seconds)
2025-05-10 09:43:59,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:44:01,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 14.57723 ± 30.046
2025-05-10 09:44:01,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [34.053303, 48.561104, 2.4417791, 32.669025, 39.787357, 9.041446, -46.87519, 14.379641, -28.95779, 40.671642]
2025-05-10 09:44:01,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [62.0, 73.0, 104.0, 90.0, 72.0, 66.0, 183.0, 68.0, 110.0, 77.0]
2025-05-10 09:44:01,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 41 minutes, 49 seconds)
2025-05-10 09:47:05,179 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:47:09,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.38039 ± 54.515
2025-05-10 09:47:09,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [24.36218, -152.63832, -36.014954, 33.931114, -42.44203, 32.479385, 27.176968, 20.198597, 0.30765188, 8.835558]
2025-05-10 09:47:09,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [110.0, 144.0, 93.0, 41.0, 134.0, 44.0, 55.0, 78.0, 1000.0, 121.0]
2025-05-10 09:47:09,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 40 minutes, 41 seconds)
2025-05-10 09:50:02,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:50:05,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -5.05793 ± 75.603
2025-05-10 09:50:05,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [47.29223, 34.054436, 41.427532, -203.4238, 60.954273, 19.308405, 31.378098, -48.789703, -54.666878, 21.88607]
2025-05-10 09:50:05,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [77.0, 83.0, 76.0, 200.0, 84.0, 23.0, 70.0, 1000.0, 145.0, 39.0]
2025-05-10 09:50:06,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 37 minutes, 56 seconds)
2025-05-10 09:52:59,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:53:05,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -46.95679 ± 91.293
2025-05-10 09:53:05,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-17.563984, -72.46441, -0.269067, 17.912706, -2.881836, 13.25954, -278.66116, -2.480275, 19.494696, -145.91406]
2025-05-10 09:53:05,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [93.0, 102.0, 100.0, 241.0, 1000.0, 24.0, 1000.0, 104.0, 125.0, 151.0]
2025-05-10 09:53:05,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 34 minutes, 52 seconds)
2025-05-10 09:55:55,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:56:01,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -54.57444 ± 152.134
2025-05-10 09:56:01,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-238.24579, 5.8125234, 26.863482, 40.799095, 34.248188, 29.258047, 23.91573, -441.33173, 32.894436, -59.958473]
2025-05-10 09:56:01,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 101.0, 197.0, 98.0, 94.0, 66.0, 24.0, 1000.0, 34.0, 167.0]
2025-05-10 09:56:01,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 31 minutes, 27 seconds)
2025-05-10 09:58:41,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 09:58:50,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -8.22059 ± 58.808
2025-05-10 09:58:50,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-28.80011, -152.8717, 38.670605, 19.845432, -70.48442, 47.940174, 32.559265, -3.4123912, 28.21352, 6.133726]
2025-05-10 09:58:50,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 189.0, 1000.0, 34.0, 112.0, 66.0, 87.0, 141.0, 33.0, 1000.0]
2025-05-10 09:58:50,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 28 minutes, 53 seconds)
2025-05-10 10:01:40,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:01:41,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 18.59372 ± 21.793
2025-05-10 10:01:41,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [0.6170975, 14.874981, 23.360773, 31.029015, 1.5422672, 16.52806, 67.57819, -18.760662, 18.517721, 30.649765]
2025-05-10 10:01:41,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [81.0, 88.0, 58.0, 51.0, 79.0, 41.0, 86.0, 62.0, 41.0, 85.0]
2025-05-10 10:01:41,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 24 minutes, 22 seconds)
2025-05-10 10:04:43,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:04:47,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -0.92062 ± 33.781
2025-05-10 10:04:47,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-32.298977, 51.4921, -51.18771, 2.11797, 20.367783, 32.57861, -11.191618, -20.80057, -39.29777, 39.013996]
2025-05-10 10:04:47,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [315.0, 66.0, 1000.0, 74.0, 73.0, 80.0, 164.0, 153.0, 112.0, 85.0]
2025-05-10 10:04:47,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 22 minutes, 18 seconds)
2025-05-10 10:07:27,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:07:29,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -3.78461 ± 47.243
2025-05-10 10:07:29,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [29.341326, 19.250286, -13.212306, 22.42932, 1.2412986, -128.86554, 29.522818, -31.872213, -9.943276, 44.262188]
2025-05-10 10:07:29,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [167.0, 23.0, 73.0, 56.0, 71.0, 181.0, 71.0, 103.0, 128.0, 116.0]
2025-05-10 10:07:29,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 17 minutes, 47 seconds)
2025-05-10 10:10:21,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:10:27,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -6.19349 ± 149.105
2025-05-10 10:10:27,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [101.75595, 25.31363, 36.613, -102.93961, -423.47958, 88.51993, 56.25809, 75.67304, 55.7141, 24.636566]
2025-05-10 10:10:27,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [234.0, 157.0, 161.0, 182.0, 1000.0, 291.0, 195.0, 174.0, 103.0, 351.0]
2025-05-10 10:10:27,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 15 minutes, 1 second)
2025-05-10 10:13:17,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:13:19,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -24.73506 ± 49.165
2025-05-10 10:13:19,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [3.5864277, -136.40196, -1.3096678, 12.818357, -69.652016, -22.803268, 26.084408, -70.6919, -8.568173, 19.587187]
2025-05-10 10:13:19,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [114.0, 192.0, 55.0, 39.0, 153.0, 105.0, 88.0, 111.0, 91.0, 80.0]
2025-05-10 10:13:19,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 12 minutes, 28 seconds)
2025-05-10 10:16:15,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:16:21,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 27.44458 ± 218.867
2025-05-10 10:16:21,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [54.874016, -36.41777, 552.6687, 1.5883062, -66.708176, 15.410813, 30.455482, 96.166435, -401.42407, 27.832016]
2025-05-10 10:16:21,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [73.0, 74.0, 1000.0, 83.0, 107.0, 138.0, 56.0, 181.0, 1000.0, 96.0]
2025-05-10 10:16:21,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 10 minutes, 21 seconds)
2025-05-10 10:19:12,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:19:21,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -10.12924 ± 70.224
2025-05-10 10:19:21,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-16.738379, -24.082577, 48.711594, 32.035316, 35.968365, -14.961689, -209.23708, 14.236665, 21.058619, 11.716796]
2025-05-10 10:19:21,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 69.0, 237.0, 100.0, 53.0, 114.0, 307.0, 1000.0, 177.0, 1000.0]
2025-05-10 10:19:21,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 6 minutes, 58 seconds)
2025-05-10 10:22:21,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:22:23,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 21.53206 ± 47.234
2025-05-10 10:22:23,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-99.04696, 6.856353, 45.225285, 25.191555, 41.905575, -16.152855, 74.8002, 31.650244, 64.57735, 40.313908]
2025-05-10 10:22:23,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [313.0, 141.0, 156.0, 54.0, 59.0, 88.0, 126.0, 66.0, 119.0, 78.0]
2025-05-10 10:22:23,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 5 minutes, 31 seconds)
2025-05-10 10:25:15,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:25:21,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -56.81775 ± 187.273
2025-05-10 10:25:21,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [88.410286, -390.4024, 31.520432, -12.9022, 15.053456, 39.32385, 42.436188, -461.2806, 7.6068873, 72.05662]
2025-05-10 10:25:21,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [136.0, 1000.0, 101.0, 149.0, 70.0, 52.0, 181.0, 1000.0, 176.0, 171.0]
2025-05-10 10:25:21,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 2 minutes, 36 seconds)
2025-05-10 10:28:12,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:28:14,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 15.31878 ± 25.769
2025-05-10 10:28:14,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [23.468409, 40.863888, 29.292952, -41.44197, 4.5563145, -2.3530943, 40.418594, 34.403122, 35.012524, -11.032963]
2025-05-10 10:28:14,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [105.0, 57.0, 86.0, 124.0, 128.0, 125.0, 127.0, 118.0, 83.0, 129.0]
2025-05-10 10:28:14,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 81/100 (estimated time remaining: 59 minutes, 39 seconds)
2025-05-10 10:30:58,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:31:03,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 8.08111 ± 88.736
2025-05-10 10:31:03,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [25.253952, 32.43196, 2.0125644, 17.331257, 0.47332004, 12.604223, 215.1064, -170.08426, -43.978603, -10.33973]
2025-05-10 10:31:03,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [108.0, 115.0, 82.0, 123.0, 127.0, 81.0, 1000.0, 253.0, 197.0, 131.0]
2025-05-10 10:31:03,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 82/100 (estimated time remaining: 55 minutes, 50 seconds)
2025-05-10 10:34:03,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:34:15,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 48.63419 ± 63.115
2025-05-10 10:34:15,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [97.13747, 19.556662, 42.890034, 34.831745, 13.948054, 2.1367884, -34.059105, 6.676471, 111.3539, 191.8699]
2025-05-10 10:34:15,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 225.0, 553.0, 519.0, 426.0, 247.0, 255.0, 262.0, 1000.0, 1000.0]
2025-05-10 10:34:15,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (48.63) for latency MM1Queue_a033_s075
2025-05-10 10:34:15,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 10:34:15,706 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:34:15,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 83/100 (estimated time remaining: 53 minutes, 39 seconds)
2025-05-10 10:37:04,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:37:06,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -4.50113 ± 25.415
2025-05-10 10:37:06,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-62.00976, 6.2924237, 20.224861, -34.74387, 10.895476, -11.507587, 22.688438, -11.613769, -1.3046025, 16.06712]
2025-05-10 10:37:06,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [155.0, 99.0, 58.0, 132.0, 101.0, 138.0, 149.0, 96.0, 191.0, 114.0]
2025-05-10 10:37:06,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 84/100 (estimated time remaining: 50 minutes, 3 seconds)
2025-05-10 10:39:53,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:39:57,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 14.73689 ± 33.672
2025-05-10 10:39:57,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-5.9433584, 7.7366433, 21.441757, 33.59942, 14.538953, 95.89145, -32.807632, 27.499796, -22.7972, 8.209071]
2025-05-10 10:39:57,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [173.0, 96.0, 73.0, 80.0, 24.0, 1000.0, 124.0, 65.0, 116.0, 102.0]
2025-05-10 10:39:57,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 85/100 (estimated time remaining: 46 minutes, 42 seconds)
2025-05-10 10:42:50,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:42:53,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 15.74340 ± 37.095
2025-05-10 10:42:53,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-56.236954, -54.787704, 37.330963, 42.903374, 33.53525, 18.264212, 33.63442, 50.231117, 39.443573, 13.115694]
2025-05-10 10:42:53,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [179.0, 249.0, 142.0, 143.0, 158.0, 166.0, 96.0, 171.0, 108.0, 172.0]
2025-05-10 10:42:53,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 86/100 (estimated time remaining: 43 minutes, 55 seconds)
2025-05-10 10:46:01,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:46:04,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 28.82436 ± 41.287
2025-05-10 10:46:04,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [52.86444, 60.377167, 56.097996, -68.48272, 86.81398, 8.428301, 28.30061, 50.174694, 12.213638, 1.4555336]
2025-05-10 10:46:04,161 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [96.0, 84.0, 125.0, 255.0, 147.0, 94.0, 123.0, 79.0, 102.0, 258.0]
2025-05-10 10:46:04,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 87/100 (estimated time remaining: 42 minutes, 3 seconds)
2025-05-10 10:48:41,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:48:43,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 17.96549 ± 30.130
2025-05-10 10:48:43,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [22.616379, 57.571186, 40.080116, 22.40993, 63.763256, 10.680108, 18.983938, -1.385816, -15.019542, -40.04468]
2025-05-10 10:48:43,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [21.0, 112.0, 106.0, 199.0, 146.0, 136.0, 126.0, 149.0, 114.0, 157.0]
2025-05-10 10:48:43,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 88/100 (estimated time remaining: 37 minutes, 37 seconds)
2025-05-10 10:51:34,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:51:37,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 3.74228 ± 55.550
2025-05-10 10:51:37,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-10.043635, 7.9270825, 23.794865, 30.696775, 45.240036, 10.3342705, 24.603558, 35.946842, -156.87492, 25.797953]
2025-05-10 10:51:37,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [385.0, 72.0, 208.0, 107.0, 106.0, 122.0, 72.0, 141.0, 214.0, 127.0]
2025-05-10 10:51:37,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 89/100 (estimated time remaining: 34 minutes, 50 seconds)
2025-05-10 10:54:31,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:54:40,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 34.58855 ± 25.591
2025-05-10 10:54:40,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [29.411911, -25.559946, 39.93674, 28.04293, 32.268574, 81.00766, 41.262238, 47.296597, 51.311558, 20.907274]
2025-05-10 10:54:40,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [158.0, 1000.0, 131.0, 1000.0, 116.0, 189.0, 1000.0, 126.0, 123.0, 192.0]
2025-05-10 10:54:40,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 22 seconds)
2025-05-10 10:57:42,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 10:58:01,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 72.64189 ± 62.008
2025-05-10 10:58:01,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [159.51662, 126.90508, 84.429665, 19.724037, -51.68633, 101.51106, 40.54776, 126.918045, 106.15581, 12.39719]
2025-05-10 10:58:01,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 23.0, 206.0, 1000.0, 1000.0, 1000.0, 1000.0, 990.0]
2025-05-10 10:58:01,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (72.64) for latency MM1Queue_a033_s075
2025-05-10 10:58:01,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 10:58:01,383 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 10:58:01,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 16 seconds)
2025-05-10 11:00:44,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:00:57,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 22.25403 ± 60.808
2025-05-10 11:00:57,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-38.689342, 24.750132, 108.830185, -14.521787, 103.815125, 114.77017, -38.947506, -15.390425, -41.183567, 19.10734]
2025-05-10 11:00:57,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [603.0, 29.0, 1000.0, 437.0, 1000.0, 1000.0, 741.0, 397.0, 572.0, 22.0]
2025-05-10 11:00:57,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 48 seconds)
2025-05-10 11:03:52,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:04:13,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 101.99809 ± 33.769
2025-05-10 11:04:13,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [122.001755, 84.28594, 83.61985, 98.7523, 158.12323, 95.42202, 22.54142, 115.24477, 117.59421, 122.39531]
2025-05-10 11:04:13,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 24.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:04:13,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1226 [INFO]: New best (102.00) for latency MM1Queue_a033_s075
2025-05-10 11:04:13,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1229 [INFO]: saving network
2025-05-10 11:04:13,612 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-ant/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:04:13,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 47 seconds)
2025-05-10 11:07:11,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:07:28,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 93.46526 ± 120.775
2025-05-10 11:07:28,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [292.34262, 15.142898, 24.798395, 45.882656, 269.02396, 14.035306, 17.57805, 4.4998293, -16.315601, 267.66455]
2025-05-10 11:07:28,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 445.0, 1000.0, 745.0, 1000.0, 22.0, 1000.0, 535.0, 1000.0, 1000.0]
2025-05-10 11:07:28,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 10 seconds)
2025-05-10 11:10:18,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:10:21,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 30.59533 ± 20.027
2025-05-10 11:10:21,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [17.51225, 36.871754, 58.43884, 18.925484, 26.432327, 2.6512344, 60.50838, 40.055054, 0.27796084, 44.279976]
2025-05-10 11:10:21,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [22.0, 138.0, 214.0, 22.0, 138.0, 36.0, 194.0, 212.0, 175.0, 119.0]
2025-05-10 11:10:21,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 48 seconds)
2025-05-10 11:13:26,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:13:33,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 41.45618 ± 35.068
2025-05-10 11:13:33,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [124.75924, 40.065033, 29.803808, -6.627325, 34.77769, 36.257656, -4.511163, 40.28101, 64.552376, 55.203423]
2025-05-10 11:13:33,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 268.0, 165.0, 222.0, 1000.0, 230.0, 175.0, 134.0, 119.0, 179.0]
2025-05-10 11:13:33,882 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 32 seconds)
2025-05-10 11:16:21,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:16:27,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 17.13202 ± 38.120
2025-05-10 11:16:27,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [17.061872, 42.508896, 51.37576, 37.203976, -57.946014, -49.140587, 20.70365, 18.40479, 27.536068, 63.61176]
2025-05-10 11:16:27,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [252.0, 168.0, 281.0, 385.0, 355.0, 331.0, 231.0, 444.0, 243.0, 299.0]
2025-05-10 11:16:27,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 24 seconds)
2025-05-10 11:19:10,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:19:31,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -150.06287 ± 56.905
2025-05-10 11:19:31,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-177.838, -160.0091, -180.5931, 15.179973, -144.23125, -191.32329, -176.31328, -176.87273, -147.4509, -161.17697]
2025-05-10 11:19:31,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 44.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:19:31,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 10 seconds)
2025-05-10 11:22:23,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:22:47,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -338.85913 ± 24.343
2025-05-10 11:22:47,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-348.80933, -352.56006, -328.43185, -329.5147, -317.81445, -400.457, -343.72208, -323.16937, -335.68643, -308.42584]
2025-05-10 11:22:47,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 11:22:47,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 7 seconds)
2025-05-10 11:25:42,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:25:48,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: 32.12801 ± 40.529
2025-05-10 11:25:48,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-8.178329, 120.691536, 39.208538, 19.294344, 59.893986, -19.42967, 37.98803, -18.107084, 30.35309, 59.565617]
2025-05-10 11:25:48,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [230.0, 1000.0, 145.0, 147.0, 166.0, 187.0, 120.0, 195.0, 156.0, 152.0]
2025-05-10 11:25:48,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 5 seconds)
2025-05-10 11:28:43,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:28:55,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1221 [DEBUG]: Total Reward: -41.21192 ± 37.841
2025-05-10 11:28:55,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1222 [DEBUG]: All rewards: [-20.869728, -94.924255, -49.902866, 18.410442, -25.092424, -41.18235, -72.34539, 19.861755, -87.27237, -58.802044]
2025-05-10 11:28:55,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1223 [DEBUG]: All trajectory lengths: [262.0, 257.0, 1000.0, 133.0, 172.0, 1000.0, 1000.0, 23.0, 218.0, 1000.0]
2025-05-10 11:28:55,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-ant):1251 [DEBUG]: Training session finished
