2025-05-09 18:07:57,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac
2025-05-09 18:07:57,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac
2025-05-09 18:07:57,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x77125fc40f70>}
2025-05-09 18:07:57,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1111 [DEBUG]: using device: cpu
2025-05-09 18:07:57,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-09 18:07:57,102 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=11, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-09 18:07:57,102 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 18:07:57,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-09 18:07:57,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-09 18:10:10,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:10:10,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 8.18105 ± 1.956
2025-05-09 18:10:10,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [8.339876, 7.13986, 7.231487, 6.585612, 8.241963, 7.328432, 7.3436966, 13.740669, 7.140576, 8.718277]
2025-05-09 18:10:10,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [10.0, 9.0, 9.0, 9.0, 10.0, 9.0, 9.0, 16.0, 9.0, 11.0]
2025-05-09 18:10:10,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (8.18) for latency MM1Queue_a033_s075
2025-05-09 18:10:10,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 18:10:10,690 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:10:10,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 40 minutes, 10 seconds)
2025-05-09 18:12:32,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:12:32,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 80.09935 ± 69.492
2025-05-09 18:12:32,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [227.76408, 153.4111, 46.414284, 153.23367, 41.17114, 8.270183, 25.933111, 12.959171, 73.32872, 58.508034]
2025-05-09 18:12:32,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 84.0, 28.0, 81.0, 38.0, 10.0, 24.0, 14.0, 41.0, 34.0]
2025-05-09 18:12:32,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (80.10) for latency MM1Queue_a033_s075
2025-05-09 18:12:32,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 18:12:32,529 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:12:32,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 44 minutes, 48 seconds)
2025-05-09 18:14:55,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:14:56,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 120.32597 ± 81.623
2025-05-09 18:14:56,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [10.628999, 153.6006, 206.36894, 167.47958, 10.337208, 120.63911, 194.33241, 243.11328, 69.01171, 27.747833]
2025-05-09 18:14:56,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 86.0, 103.0, 88.0, 12.0, 63.0, 91.0, 116.0, 42.0, 24.0]
2025-05-09 18:14:56,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (120.33) for latency MM1Queue_a033_s075
2025-05-09 18:14:56,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 18:14:56,022 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:14:56,027 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 45 minutes, 40 seconds)
2025-05-09 18:17:19,603 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:17:20,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 86.63760 ± 83.181
2025-05-09 18:17:20,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [42.733017, 141.90105, 15.050457, 10.517931, 19.55192, 10.402924, 212.56187, 27.604153, 170.67194, 215.38078]
2025-05-09 18:17:20,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 72.0, 15.0, 12.0, 21.0, 13.0, 103.0, 24.0, 85.0, 99.0]
2025-05-09 18:17:20,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 45 minutes, 9 seconds)
2025-05-09 18:19:43,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:19:44,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 196.35489 ± 116.188
2025-05-09 18:19:44,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [75.58242, 231.72273, 184.5394, 325.10376, 39.022114, 78.201225, 376.411, 339.3149, 219.06207, 94.5891]
2025-05-09 18:19:44,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 100.0, 85.0, 117.0, 30.0, 43.0, 192.0, 132.0, 96.0, 56.0]
2025-05-09 18:19:44,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (196.35) for latency MM1Queue_a033_s075
2025-05-09 18:19:44,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 18:19:44,784 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:19:44,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 44 minutes, 3 seconds)
2025-05-09 18:22:08,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:22:09,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 130.16751 ± 94.088
2025-05-09 18:22:09,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [9.425911, 176.18208, 204.53621, 9.929063, 145.70717, 320.98425, 137.78284, 133.17564, 9.596376, 154.35545]
2025-05-09 18:22:09,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 87.0, 89.0, 11.0, 81.0, 119.0, 70.0, 71.0, 11.0, 80.0]
2025-05-09 18:22:09,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 45 minutes, 9 seconds)
2025-05-09 18:24:35,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:24:35,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 170.37755 ± 135.224
2025-05-09 18:24:35,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [13.68938, 300.22742, 289.6029, 196.82927, 339.7262, 44.08225, 8.2604265, 99.23167, 51.128155, 360.99777]
2025-05-09 18:24:35,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 118.0, 111.0, 91.0, 134.0, 26.0, 10.0, 56.0, 35.0, 141.0]
2025-05-09 18:24:35,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 44 minutes, 13 seconds)
2025-05-09 18:27:04,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:27:05,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 116.67483 ± 109.987
2025-05-09 18:27:05,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [11.160295, 10.853257, 95.9996, 186.8035, 38.604904, 171.19493, 87.9522, 16.807465, 163.85893, 383.51312]
2025-05-09 18:27:05,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 12.0, 64.0, 90.0, 32.0, 81.0, 56.0, 17.0, 85.0, 141.0]
2025-05-09 18:27:05,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 43 minutes, 37 seconds)
2025-05-09 18:29:33,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:29:33,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 81.41792 ± 58.512
2025-05-09 18:29:33,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [137.1217, 12.696552, 108.91087, 106.58042, 97.68923, 9.058506, 9.452612, 198.48857, 56.447834, 77.73296]
2025-05-09 18:29:33,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 14.0, 62.0, 60.0, 63.0, 11.0, 12.0, 94.0, 38.0, 46.0]
2025-05-09 18:29:33,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 42 minutes, 29 seconds)
2025-05-09 18:32:03,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:32:03,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 95.95292 ± 80.652
2025-05-09 18:32:03,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [196.56456, 83.26268, 17.956793, 122.50703, 267.7736, 97.56742, 39.154392, 9.99356, 9.552431, 115.19669]
2025-05-09 18:32:03,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 47.0, 27.0, 68.0, 108.0, 54.0, 33.0, 11.0, 11.0, 63.0]
2025-05-09 18:32:03,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 41 minutes, 40 seconds)
2025-05-09 18:34:31,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:34:32,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 202.63353 ± 123.974
2025-05-09 18:34:32,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [134.45648, 346.7236, 8.474891, 201.61075, 234.86856, 186.62804, 363.94174, 364.70486, 12.681748, 172.2446]
2025-05-09 18:34:32,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 123.0, 10.0, 92.0, 112.0, 90.0, 127.0, 151.0, 15.0, 83.0]
2025-05-09 18:34:32,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (202.63) for latency MM1Queue_a033_s075
2025-05-09 18:34:32,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 18:34:32,450 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:34:32,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 40 minutes, 28 seconds)
2025-05-09 18:37:01,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:37:01,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 133.51262 ± 105.202
2025-05-09 18:37:01,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [165.91, 80.69738, 382.2762, 10.689463, 205.84018, 203.10489, 96.83595, 14.356806, 82.5995, 92.815796]
2025-05-09 18:37:01,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 48.0, 139.0, 12.0, 100.0, 93.0, 56.0, 21.0, 49.0, 52.0]
2025-05-09 18:37:01,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 38 minutes, 50 seconds)
2025-05-09 18:39:29,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:39:30,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 187.25658 ± 132.893
2025-05-09 18:39:30,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [45.130917, 169.25143, 347.5014, 14.027687, 55.522522, 296.67258, 49.764565, 201.66159, 371.32532, 321.7078]
2025-05-09 18:39:30,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 79.0, 128.0, 15.0, 37.0, 112.0, 34.0, 92.0, 130.0, 120.0]
2025-05-09 18:39:30,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 36 minutes, 2 seconds)
2025-05-09 18:41:58,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:41:59,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 234.90596 ± 138.520
2025-05-09 18:41:59,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [8.601511, 367.62985, 324.63245, 71.08595, 363.96954, 325.42618, 8.588068, 271.7138, 328.88025, 278.53186]
2025-05-09 18:41:59,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 128.0, 126.0, 42.0, 134.0, 119.0, 10.0, 105.0, 121.0, 113.0]
2025-05-09 18:41:59,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (234.91) for latency MM1Queue_a033_s075
2025-05-09 18:41:59,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 18:41:59,657 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:41:59,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 33 minutes, 51 seconds)
2025-05-09 18:44:28,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:44:29,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 143.28090 ± 91.862
2025-05-09 18:44:29,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [236.90984, 172.271, 309.83536, 162.01839, 197.92102, 13.918025, 162.21385, 114.8142, 51.310352, 11.596909]
2025-05-09 18:44:29,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 81.0, 114.0, 76.0, 86.0, 16.0, 76.0, 60.0, 36.0, 13.0]
2025-05-09 18:44:29,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 31 minutes, 12 seconds)
2025-05-09 18:46:59,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:47:00,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 196.41629 ± 121.056
2025-05-09 18:47:00,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [190.31624, 58.757607, 347.52356, 369.71164, 177.86249, 47.266, 53.51062, 315.23422, 295.53403, 108.44651]
2025-05-09 18:47:00,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 38.0, 137.0, 137.0, 81.0, 41.0, 37.0, 122.0, 128.0, 57.0]
2025-05-09 18:47:00,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 29 minutes, 20 seconds)
2025-05-09 18:49:27,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:49:28,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 202.01976 ± 131.953
2025-05-09 18:49:28,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [57.470097, 15.2581215, 384.6083, 320.89618, 167.54634, 322.12842, 389.41473, 135.59369, 120.93599, 106.34568]
2025-05-09 18:49:28,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [34.0, 16.0, 143.0, 141.0, 77.0, 116.0, 144.0, 68.0, 64.0, 62.0]
2025-05-09 18:49:28,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 26 minutes, 40 seconds)
2025-05-09 18:51:57,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:51:58,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 259.81082 ± 119.578
2025-05-09 18:51:58,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [92.47835, 351.0576, 287.32077, 189.10117, 252.22421, 10.834463, 404.5196, 338.6654, 343.19366, 328.71298]
2025-05-09 18:51:58,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 133.0, 107.0, 100.0, 100.0, 12.0, 151.0, 125.0, 129.0, 123.0]
2025-05-09 18:51:58,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (259.81) for latency MM1Queue_a033_s075
2025-05-09 18:51:58,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 18:51:58,470 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:51:58,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 24 minutes, 31 seconds)
2025-05-09 18:54:27,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:54:28,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 202.42589 ± 132.543
2025-05-09 18:54:28,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [9.2086935, 340.01947, 12.31744, 341.67773, 151.88113, 355.88254, 160.95976, 341.88004, 233.16524, 77.266884]
2025-05-09 18:54:28,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 129.0, 13.0, 127.0, 82.0, 129.0, 75.0, 125.0, 95.0, 49.0]
2025-05-09 18:54:28,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 22 minutes, 14 seconds)
2025-05-09 18:56:58,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:56:59,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 134.77524 ± 71.329
2025-05-09 18:56:59,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [63.729427, 90.86266, 127.24689, 210.81947, 150.80191, 172.55792, 119.555504, 279.94055, 119.25344, 12.984587]
2025-05-09 18:56:59,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 52.0, 78.0, 99.0, 73.0, 80.0, 64.0, 120.0, 65.0, 15.0]
2025-05-09 18:56:59,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 20 minutes, 1 second)
2025-05-09 18:59:28,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:59:29,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 176.97330 ± 121.046
2025-05-09 18:59:29,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [228.30269, 344.174, 234.33063, 75.18423, 41.532623, 213.96977, 11.563517, 369.00854, 44.907978, 206.75899]
2025-05-09 18:59:29,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 126.0, 101.0, 43.0, 60.0, 93.0, 13.0, 133.0, 33.0, 109.0]
2025-05-09 18:59:29,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 17 minutes, 16 seconds)
2025-05-09 19:01:56,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:01:57,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 266.40347 ± 113.246
2025-05-09 19:01:57,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [359.98926, 162.78458, 269.33777, 352.69464, 365.32996, 344.0253, 118.72632, 293.8522, 362.57104, 34.72377]
2025-05-09 19:01:57,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 78.0, 122.0, 173.0, 137.0, 133.0, 60.0, 112.0, 141.0, 28.0]
2025-05-09 19:01:57,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (266.40) for latency MM1Queue_a033_s075
2025-05-09 19:01:57,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 19:01:57,777 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:01:57,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 14 minutes, 41 seconds)
2025-05-09 19:04:26,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:04:27,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 195.08426 ± 108.700
2025-05-09 19:04:27,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [254.5138, 102.344925, 73.681526, 374.89597, 380.74265, 70.48975, 187.87291, 103.32953, 187.21658, 215.75504]
2025-05-09 19:04:27,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 60.0, 48.0, 133.0, 133.0, 47.0, 87.0, 59.0, 107.0, 93.0]
2025-05-09 19:04:27,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 12 minutes, 17 seconds)
2025-05-09 19:06:56,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:06:57,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 97.80858 ± 71.395
2025-05-09 19:06:57,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [17.71277, 114.04169, 122.719635, 143.4798, 152.56375, 120.860825, 42.262756, 240.2282, 12.630011, 11.58644]
2025-05-09 19:06:57,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 80.0, 67.0, 73.0, 77.0, 64.0, 33.0, 101.0, 14.0, 13.0]
2025-05-09 19:06:57,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 9 minutes, 39 seconds)
2025-05-09 19:09:25,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:09:27,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 266.30316 ± 110.852
2025-05-09 19:09:27,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [191.42274, 262.40384, 399.07727, 299.80994, 353.81592, 394.08725, 363.51685, 52.79684, 140.70352, 205.39726]
2025-05-09 19:09:27,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 106.0, 162.0, 115.0, 131.0, 137.0, 137.0, 34.0, 69.0, 118.0]
2025-05-09 19:09:27,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 6 minutes, 56 seconds)
2025-05-09 19:11:54,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:11:55,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 256.17575 ± 123.640
2025-05-09 19:11:55,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [324.76187, 377.93552, 47.005592, 90.0965, 395.79593, 182.09674, 317.52768, 363.14932, 333.2226, 130.1658]
2025-05-09 19:11:55,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 129.0, 35.0, 54.0, 156.0, 82.0, 121.0, 135.0, 140.0, 67.0]
2025-05-09 19:11:55,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 4 minutes, 10 seconds)
2025-05-09 19:14:24,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:14:25,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 191.69228 ± 141.059
2025-05-09 19:14:25,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [337.04205, 13.17167, 358.73456, 16.040426, 21.276293, 83.00153, 369.47296, 195.6303, 311.5924, 210.96042]
2025-05-09 19:14:25,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [142.0, 19.0, 128.0, 21.0, 25.0, 51.0, 130.0, 102.0, 135.0, 105.0]
2025-05-09 19:14:25,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 1 minute, 59 seconds)
2025-05-09 19:16:54,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:16:55,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 212.40598 ± 113.938
2025-05-09 19:16:55,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [14.372442, 139.87544, 113.28862, 158.78014, 360.96173, 181.45346, 342.03568, 236.32365, 390.12387, 186.84483]
2025-05-09 19:16:55,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 70.0, 61.0, 75.0, 123.0, 86.0, 122.0, 93.0, 135.0, 86.0]
2025-05-09 19:16:55,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 59 minutes, 28 seconds)
2025-05-09 19:19:23,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:19:25,142 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 279.69913 ± 106.858
2025-05-09 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [362.15192, 142.59656, 208.91164, 344.922, 393.90945, 219.62198, 324.50797, 393.79877, 69.1822, 337.3888]
2025-05-09 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 80.0, 92.0, 125.0, 136.0, 118.0, 119.0, 140.0, 41.0, 131.0]
2025-05-09 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (279.70) for latency MM1Queue_a033_s075
2025-05-09 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 19:19:25,147 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:19:25,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 56 minutes, 58 seconds)
2025-05-09 19:21:52,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:21:53,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 210.80095 ± 119.869
2025-05-09 19:21:53,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [9.342343, 91.881874, 82.73567, 187.05092, 308.56992, 254.16774, 351.0625, 369.6054, 139.7599, 313.8334]
2025-05-09 19:21:53,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 51.0, 50.0, 84.0, 114.0, 96.0, 125.0, 130.0, 73.0, 118.0]
2025-05-09 19:21:53,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 54 minutes, 15 seconds)
2025-05-09 19:24:21,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:24:22,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 165.28981 ± 118.709
2025-05-09 19:24:22,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [372.434, 162.63884, 181.8189, 73.91799, 234.81235, 116.144424, 363.15582, 9.556911, 57.680363, 80.73843]
2025-05-09 19:24:22,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 83.0, 83.0, 45.0, 93.0, 62.0, 138.0, 12.0, 38.0, 47.0]
2025-05-09 19:24:22,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 51 minutes, 45 seconds)
2025-05-09 19:26:52,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:26:53,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 224.80923 ± 116.414
2025-05-09 19:26:53,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [153.3535, 229.37976, 83.40002, 315.5418, 339.74628, 380.91156, 50.610867, 241.44237, 357.15707, 96.549034]
2025-05-09 19:26:53,734 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 95.0, 59.0, 121.0, 126.0, 138.0, 35.0, 97.0, 127.0, 55.0]
2025-05-09 19:26:53,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 49 minutes, 33 seconds)
2025-05-09 19:29:20,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:29:21,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 161.22804 ± 139.819
2025-05-09 19:29:21,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [114.670555, 376.16245, 348.05542, 86.657555, 9.558292, 116.798996, 193.31946, 11.884584, 347.61304, 7.559898]
2025-05-09 19:29:21,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 160.0, 130.0, 50.0, 11.0, 61.0, 86.0, 13.0, 124.0, 9.0]
2025-05-09 19:29:21,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 46 minutes, 36 seconds)
2025-05-09 19:31:49,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:31:50,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 192.45752 ± 131.406
2025-05-09 19:31:50,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [116.74647, 258.5821, 389.6532, 116.39603, 391.42847, 10.118471, 286.1888, 137.36443, 10.741979, 207.35521]
2025-05-09 19:31:50,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 103.0, 137.0, 62.0, 140.0, 12.0, 111.0, 69.0, 14.0, 89.0]
2025-05-09 19:31:50,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 44 minutes, 2 seconds)
2025-05-09 19:34:19,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:34:19,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 169.06189 ± 149.345
2025-05-09 19:34:19,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [12.139785, 69.49286, 70.83746, 390.79462, 388.00143, 131.08167, 389.04126, 101.741554, 8.493657, 128.9946]
2025-05-09 19:34:19,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 46.0, 45.0, 132.0, 137.0, 65.0, 130.0, 55.0, 10.0, 63.0]
2025-05-09 19:34:19,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 41 minutes, 37 seconds)
2025-05-09 19:36:48,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:36:49,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 217.50888 ± 125.940
2025-05-09 19:36:49,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [379.44348, 96.40492, 124.57441, 342.0148, 20.05987, 371.07196, 336.39932, 101.31488, 163.88397, 239.92119]
2025-05-09 19:36:49,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 53.0, 66.0, 124.0, 21.0, 128.0, 125.0, 68.0, 78.0, 98.0]
2025-05-09 19:36:49,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 39 minutes, 23 seconds)
2025-05-09 19:39:18,284 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:39:19,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 237.63982 ± 151.441
2025-05-09 19:39:19,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [399.0383, 324.13528, 32.617584, 385.03528, 246.53735, 11.995225, 352.49615, 35.109985, 395.5423, 193.89085]
2025-05-09 19:39:19,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 116.0, 28.0, 135.0, 100.0, 13.0, 142.0, 30.0, 152.0, 94.0]
2025-05-09 19:39:19,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 36 minutes, 34 seconds)
2025-05-09 19:41:47,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:41:48,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 269.48569 ± 100.126
2025-05-09 19:41:48,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [329.5295, 180.62798, 333.56396, 263.46957, 278.54016, 394.02405, 392.7591, 41.62834, 250.48784, 230.22667]
2025-05-09 19:41:48,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 91.0, 120.0, 103.0, 106.0, 141.0, 131.0, 33.0, 103.0, 95.0]
2025-05-09 19:41:48,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 34 minutes, 24 seconds)
2025-05-09 19:44:16,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:44:17,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 221.76262 ± 146.135
2025-05-09 19:44:17,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [15.3741, 134.4944, 346.94455, 240.35985, 383.80157, 381.04242, 215.11224, 82.56804, 14.088089, 403.841]
2025-05-09 19:44:17,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 86.0, 130.0, 94.0, 134.0, 131.0, 91.0, 51.0, 15.0, 139.0]
2025-05-09 19:44:17,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 31 minutes, 53 seconds)
2025-05-09 19:46:47,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:46:48,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 144.36783 ± 107.837
2025-05-09 19:46:48,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [57.539352, 41.861454, 115.76311, 88.48882, 137.72955, 270.4852, 9.501791, 338.75333, 97.22546, 286.33035]
2025-05-09 19:46:48,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 33.0, 64.0, 63.0, 69.0, 103.0, 11.0, 125.0, 53.0, 126.0]
2025-05-09 19:46:48,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 29 minutes, 42 seconds)
2025-05-09 19:49:16,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:49:17,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 226.39685 ± 134.635
2025-05-09 19:49:17,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [230.93884, 141.66278, 16.524176, 33.06322, 327.38892, 395.9377, 349.48434, 282.5898, 111.2327, 375.14597]
2025-05-09 19:49:17,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 70.0, 20.0, 28.0, 117.0, 136.0, 127.0, 107.0, 61.0, 128.0]
2025-05-09 19:49:17,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2025-05-09 19:51:46,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:51:47,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 127.52677 ± 126.472
2025-05-09 19:51:47,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [117.05525, 67.55517, 394.3358, 57.327396, 243.24171, 13.514487, 283.95865, 15.671809, 10.567318, 72.04029]
2025-05-09 19:51:47,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 43.0, 140.0, 38.0, 97.0, 14.0, 111.0, 16.0, 12.0, 67.0]
2025-05-09 19:51:47,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 24 minutes, 37 seconds)
2025-05-09 19:54:14,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:54:15,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 168.98602 ± 152.962
2025-05-09 19:54:15,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [14.097869, 12.197941, 13.791309, 345.10855, 227.01756, 184.18675, 382.64725, 88.906906, 399.51617, 22.389982]
2025-05-09 19:54:15,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 13.0, 15.0, 123.0, 92.0, 82.0, 137.0, 51.0, 143.0, 23.0]
2025-05-09 19:54:15,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 21 minutes, 56 seconds)
2025-05-09 19:56:43,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:56:43,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 167.28293 ± 144.651
2025-05-09 19:56:43,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [67.16737, 371.5647, 60.378418, 14.326706, 21.182686, 396.98605, 176.07404, 108.22162, 91.53255, 365.39505]
2025-05-09 19:56:43,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 130.0, 44.0, 17.0, 19.0, 137.0, 79.0, 64.0, 52.0, 132.0]
2025-05-09 19:56:43,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 19 minutes, 15 seconds)
2025-05-09 19:59:12,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:59:13,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 195.58676 ± 139.796
2025-05-09 19:59:13,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [14.340835, 419.75198, 128.92488, 65.11839, 124.19596, 210.99626, 198.65222, 437.9117, 292.3981, 63.57725]
2025-05-09 19:59:13,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 142.0, 68.0, 44.0, 63.0, 91.0, 88.0, 169.0, 109.0, 42.0]
2025-05-09 19:59:13,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 16 minutes, 37 seconds)
2025-05-09 20:01:42,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:01:43,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 196.34399 ± 152.451
2025-05-09 20:01:43,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [10.523727, 371.07248, 315.0022, 258.5283, 366.34064, 133.01147, 13.76096, 393.12018, 17.780254, 84.29971]
2025-05-09 20:01:43,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 134.0, 117.0, 110.0, 135.0, 68.0, 15.0, 137.0, 23.0, 50.0]
2025-05-09 20:01:43,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 14 minutes, 14 seconds)
2025-05-09 20:04:13,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:04:14,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 207.25960 ± 142.259
2025-05-09 20:04:14,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [10.193942, 18.753448, 171.35677, 384.21194, 305.40198, 342.45914, 287.92938, 358.8049, 179.18059, 14.304032]
2025-05-09 20:04:14,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 19.0, 82.0, 137.0, 113.0, 158.0, 110.0, 126.0, 92.0, 15.0]
2025-05-09 20:04:14,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 11 minutes, 53 seconds)
2025-05-09 20:06:41,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:06:42,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 160.38904 ± 127.639
2025-05-09 20:06:42,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [314.02417, 384.14984, 89.50297, 80.223175, 48.642094, 16.35496, 258.4278, 132.73474, 10.792623, 269.038]
2025-05-09 20:06:42,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 135.0, 52.0, 50.0, 35.0, 16.0, 105.0, 67.0, 12.0, 113.0]
2025-05-09 20:06:42,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 9 minutes, 25 seconds)
2025-05-09 20:09:10,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:09:11,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 240.59636 ± 119.112
2025-05-09 20:09:11,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [217.23254, 382.48865, 107.23117, 341.07767, 276.97974, 331.09238, 273.03165, 14.27149, 106.07608, 356.48227]
2025-05-09 20:09:11,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 133.0, 58.0, 124.0, 110.0, 118.0, 105.0, 14.0, 59.0, 131.0]
2025-05-09 20:09:11,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 7 minutes, 4 seconds)
2025-05-09 20:11:38,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:11:39,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 149.72641 ± 123.502
2025-05-09 20:11:39,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [389.35495, 26.644638, 362.42285, 172.5493, 67.3623, 111.64525, 149.10326, 138.9874, 67.78826, 11.4059725]
2025-05-09 20:11:39,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 27.0, 129.0, 83.0, 43.0, 59.0, 71.0, 77.0, 41.0, 13.0]
2025-05-09 20:11:39,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 4 minutes, 19 seconds)
2025-05-09 20:14:07,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:14:08,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 254.95322 ± 141.423
2025-05-09 20:14:08,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [381.63232, 323.8814, 377.88666, 397.32266, 88.968155, 9.99485, 356.3455, 151.37589, 100.45215, 361.67273]
2025-05-09 20:14:08,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 118.0, 130.0, 143.0, 52.0, 11.0, 125.0, 72.0, 55.0, 129.0]
2025-05-09 20:14:08,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 1 minute, 44 seconds)
2025-05-09 20:16:35,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:16:35,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 150.58076 ± 161.129
2025-05-09 20:16:35,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [20.661516, 13.269278, 67.43008, 393.24908, 405.06433, 143.85089, 369.6864, 12.157982, 69.83412, 10.603826]
2025-05-09 20:16:35,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 14.0, 43.0, 134.0, 155.0, 71.0, 134.0, 14.0, 39.0, 12.0]
2025-05-09 20:16:35,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 58 minutes, 41 seconds)
2025-05-09 20:19:03,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:19:03,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 140.09543 ± 104.636
2025-05-09 20:19:03,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [41.958252, 186.27109, 80.183105, 160.509, 164.30333, 228.66232, 86.41398, 11.18785, 383.18857, 58.276726]
2025-05-09 20:19:03,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 85.0, 49.0, 75.0, 76.0, 100.0, 51.0, 12.0, 131.0, 38.0]
2025-05-09 20:19:03,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 56 minutes, 9 seconds)
2025-05-09 20:21:31,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:21:32,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 169.30188 ± 135.737
2025-05-09 20:21:32,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [17.328844, 9.6378145, 132.08937, 274.16748, 138.19174, 385.41306, 14.053994, 213.64404, 121.76206, 386.73038]
2025-05-09 20:21:32,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 11.0, 66.0, 104.0, 70.0, 136.0, 14.0, 106.0, 65.0, 133.0]
2025-05-09 20:21:32,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 53 minutes, 40 seconds)
2025-05-09 20:24:01,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:24:01,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 160.95538 ± 150.944
2025-05-09 20:24:01,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [13.10508, 11.3339405, 152.24585, 244.57977, 319.29022, 379.90704, 71.46733, 389.65268, 15.75793, 12.213974]
2025-05-09 20:24:01,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 12.0, 74.0, 96.0, 118.0, 132.0, 45.0, 133.0, 19.0, 13.0]
2025-05-09 20:24:01,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 51 minutes, 21 seconds)
2025-05-09 20:26:28,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:26:29,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 174.56213 ± 167.453
2025-05-09 20:26:29,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [17.428644, 405.37433, 20.728148, 388.65967, 342.92706, 9.840837, 337.82114, 9.523242, 191.34814, 21.970062]
2025-05-09 20:26:29,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 151.0, 22.0, 139.0, 121.0, 11.0, 121.0, 11.0, 85.0, 22.0]
2025-05-09 20:26:29,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 48 minutes, 41 seconds)
2025-05-09 20:28:56,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:28:57,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 169.43817 ± 115.011
2025-05-09 20:28:57,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [134.05469, 284.63937, 10.725658, 19.03976, 130.15558, 78.16936, 224.25485, 196.42813, 214.42627, 402.48804]
2025-05-09 20:28:57,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 109.0, 12.0, 20.0, 66.0, 45.0, 95.0, 89.0, 92.0, 138.0]
2025-05-09 20:28:57,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 46 minutes, 16 seconds)
2025-05-09 20:31:25,423 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:31:26,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 227.74312 ± 142.951
2025-05-09 20:31:26,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [22.755596, 365.3628, 21.495932, 360.58807, 370.71603, 317.2078, 369.00912, 57.491253, 152.49947, 240.30501]
2025-05-09 20:31:26,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 130.0, 22.0, 131.0, 129.0, 116.0, 128.0, 40.0, 73.0, 125.0]
2025-05-09 20:31:26,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 43 minutes, 58 seconds)
2025-05-09 20:33:54,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:33:55,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 217.92062 ± 126.965
2025-05-09 20:33:55,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [127.071945, 300.61615, 25.376865, 26.686169, 341.125, 374.25906, 174.16754, 186.1008, 390.17233, 233.63039]
2025-05-09 20:33:55,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 117.0, 24.0, 25.0, 126.0, 140.0, 78.0, 88.0, 138.0, 98.0]
2025-05-09 20:33:55,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 41 minutes, 30 seconds)
2025-05-09 20:36:22,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:36:23,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 216.31503 ± 157.747
2025-05-09 20:36:23,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [321.59183, 309.9194, 405.85678, 348.9776, 7.055772, 338.12378, 334.82886, 21.8719, 14.946317, 59.978394]
2025-05-09 20:36:23,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 129.0, 139.0, 125.0, 11.0, 126.0, 146.0, 23.0, 16.0, 45.0]
2025-05-09 20:36:23,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 38 minutes, 51 seconds)
2025-05-09 20:38:51,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:38:52,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 196.99069 ± 127.926
2025-05-09 20:38:52,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [378.0479, 242.29941, 116.63871, 229.23709, 392.14917, 212.8885, 14.360015, 272.0889, 102.39188, 9.80526]
2025-05-09 20:38:52,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 98.0, 61.0, 96.0, 134.0, 90.0, 14.0, 105.0, 58.0, 11.0]
2025-05-09 20:38:52,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 36 minutes, 37 seconds)
2025-05-09 20:41:19,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:41:20,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 134.99689 ± 133.117
2025-05-09 20:41:20,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [12.012898, 41.354866, 76.275085, 15.385023, 380.54688, 125.86709, 9.8808975, 162.77238, 148.89511, 376.97864]
2025-05-09 20:41:20,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 32.0, 46.0, 17.0, 131.0, 65.0, 11.0, 76.0, 72.0, 130.0]
2025-05-09 20:41:20,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 34 minutes, 7 seconds)
2025-05-09 20:43:49,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:43:50,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 232.07320 ± 114.674
2025-05-09 20:43:50,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [304.22018, 361.51318, 256.1026, 133.96138, 310.26273, 101.11338, 293.52216, 178.20903, 9.24162, 372.58575]
2025-05-09 20:43:50,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 133.0, 102.0, 68.0, 119.0, 57.0, 111.0, 82.0, 11.0, 139.0]
2025-05-09 20:43:50,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 31 minutes, 46 seconds)
2025-05-09 20:46:18,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:46:19,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 245.22722 ± 133.999
2025-05-09 20:46:19,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [75.973434, 341.6547, 82.05931, 55.11839, 388.9201, 395.69925, 168.9552, 318.98184, 225.1006, 399.80936]
2025-05-09 20:46:19,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 131.0, 50.0, 37.0, 134.0, 136.0, 77.0, 116.0, 94.0, 138.0]
2025-05-09 20:46:19,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 29 minutes, 20 seconds)
2025-05-09 20:48:48,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:48:49,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 193.85822 ± 111.485
2025-05-09 20:48:49,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [276.68765, 195.80649, 14.730166, 65.973625, 159.51512, 123.74943, 361.58685, 114.352264, 311.95767, 314.22287]
2025-05-09 20:48:49,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 104.0, 15.0, 42.0, 74.0, 76.0, 129.0, 62.0, 133.0, 138.0]
2025-05-09 20:48:49,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 27 minutes, 2 seconds)
2025-05-09 20:51:15,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:51:16,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 206.83611 ± 113.819
2025-05-09 20:51:16,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [297.05344, 226.03746, 11.955544, 378.9278, 212.20876, 209.57896, 147.42056, 14.853224, 265.7855, 304.53973]
2025-05-09 20:51:16,484 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 92.0, 15.0, 133.0, 91.0, 90.0, 73.0, 16.0, 105.0, 114.0]
2025-05-09 20:51:16,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 24 minutes, 17 seconds)
2025-05-09 20:53:44,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:53:45,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 264.48486 ± 108.702
2025-05-09 20:53:45,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [140.21912, 399.53714, 175.73972, 394.58145, 200.61894, 237.21227, 390.35876, 147.7359, 391.08313, 167.76212]
2025-05-09 20:53:45,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 139.0, 85.0, 138.0, 87.0, 108.0, 133.0, 71.0, 141.0, 77.0]
2025-05-09 20:53:45,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 21 minutes, 57 seconds)
2025-05-09 20:56:13,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:56:14,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 308.17291 ± 113.693
2025-05-09 20:56:14,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [399.05948, 252.81322, 10.3935175, 386.35468, 327.0676, 399.97092, 350.76752, 386.48636, 232.97173, 335.84424]
2025-05-09 20:56:14,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 101.0, 12.0, 137.0, 120.0, 143.0, 133.0, 137.0, 96.0, 121.0]
2025-05-09 20:56:14,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (308.17) for latency MM1Queue_a033_s075
2025-05-09 20:56:14,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 20:56:14,669 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 20:56:14,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 19 minutes, 21 seconds)
2025-05-09 20:58:43,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:58:44,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 259.00824 ± 143.825
2025-05-09 20:58:44,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [323.66156, 402.66785, 387.67163, 339.24762, 378.2039, 62.36768, 178.99364, 112.29067, 12.944351, 392.03345]
2025-05-09 20:58:44,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 140.0, 138.0, 122.0, 129.0, 40.0, 82.0, 60.0, 14.0, 135.0]
2025-05-09 20:58:44,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 16 minutes, 58 seconds)
2025-05-09 21:01:13,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:01:14,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 283.72571 ± 121.896
2025-05-09 21:01:14,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [396.33606, 95.748886, 107.76601, 142.99118, 380.3282, 214.04448, 390.87692, 394.56473, 334.70215, 379.89835]
2025-05-09 21:01:14,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 54.0, 59.0, 71.0, 142.0, 92.0, 140.0, 141.0, 120.0, 130.0]
2025-05-09 21:01:14,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 14 minutes, 31 seconds)
2025-05-09 21:03:41,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:03:42,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 306.35239 ± 104.487
2025-05-09 21:03:42,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [396.4665, 282.98846, 381.7069, 371.72577, 379.35156, 101.54239, 207.9648, 164.72739, 392.92328, 384.12674]
2025-05-09 21:03:42,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 105.0, 130.0, 128.0, 128.0, 56.0, 92.0, 78.0, 142.0, 131.0]
2025-05-09 21:03:42,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 12 minutes, 6 seconds)
2025-05-09 21:06:11,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:06:12,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 213.86443 ± 114.145
2025-05-09 21:06:12,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [267.2672, 284.13614, 73.99397, 26.4639, 362.29474, 260.2425, 375.95486, 219.81436, 86.04235, 182.43437]
2025-05-09 21:06:12,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 106.0, 45.0, 24.0, 125.0, 104.0, 129.0, 109.0, 52.0, 84.0]
2025-05-09 21:06:12,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 9 minutes, 43 seconds)
2025-05-09 21:08:41,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:08:42,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 236.05281 ± 145.290
2025-05-09 21:08:42,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [142.44063, 330.6597, 391.99603, 378.36246, 257.9581, 330.77246, 390.7067, 9.97336, 13.804276, 113.854195]
2025-05-09 21:08:42,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 134.0, 142.0, 140.0, 108.0, 124.0, 135.0, 11.0, 14.0, 59.0]
2025-05-09 21:08:42,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 7 minutes, 18 seconds)
2025-05-09 21:11:10,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:11:11,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 234.75427 ± 132.792
2025-05-09 21:11:11,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [33.991024, 16.891336, 222.18367, 378.9645, 236.58601, 362.17993, 166.0379, 170.1322, 372.72302, 387.85324]
2025-05-09 21:11:11,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 22.0, 94.0, 132.0, 96.0, 132.0, 78.0, 77.0, 134.0, 136.0]
2025-05-09 21:11:11,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 4 minutes, 43 seconds)
2025-05-09 21:13:39,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:13:40,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 170.87477 ± 142.090
2025-05-09 21:13:40,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [14.689911, 411.0291, 223.2717, 176.03584, 52.25539, 267.57147, 400.63058, 45.346905, 63.372078, 54.54473]
2025-05-09 21:13:40,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 144.0, 94.0, 80.0, 36.0, 105.0, 139.0, 34.0, 41.0, 36.0]
2025-05-09 21:13:40,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 2 minutes, 8 seconds)
2025-05-09 21:16:09,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:16:10,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 196.84427 ± 125.611
2025-05-09 21:16:10,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [61.3686, 88.02614, 89.166664, 387.76346, 369.9143, 165.30643, 216.45012, 374.25888, 130.22517, 85.96298]
2025-05-09 21:16:10,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 51.0, 49.0, 134.0, 135.0, 76.0, 92.0, 129.0, 67.0, 51.0]
2025-05-09 21:16:10,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 59 minutes, 52 seconds)
2025-05-09 21:18:38,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:18:39,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 187.45755 ± 128.747
2025-05-09 21:18:39,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [110.69506, 373.89474, 189.53761, 383.7666, 8.297278, 120.1007, 319.99933, 8.576706, 191.00984, 168.69748]
2025-05-09 21:18:39,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 134.0, 105.0, 141.0, 11.0, 66.0, 115.0, 10.0, 85.0, 79.0]
2025-05-09 21:18:39,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 57 minutes, 16 seconds)
2025-05-09 21:21:07,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:21:07,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 188.92688 ± 127.910
2025-05-09 21:21:07,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [70.284485, 183.17462, 216.29979, 61.28704, 298.55414, 376.9452, 400.32098, 9.277251, 100.190056, 172.93529]
2025-05-09 21:21:07,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 84.0, 93.0, 36.0, 111.0, 132.0, 135.0, 11.0, 63.0, 78.0]
2025-05-09 21:21:07,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 54 minutes, 39 seconds)
2025-05-09 21:23:36,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:23:37,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 178.23059 ± 117.630
2025-05-09 21:23:37,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [68.66406, 319.42432, 379.68036, 267.6942, 75.998726, 122.167885, 70.06237, 71.191696, 104.2353, 303.18695]
2025-05-09 21:23:37,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 118.0, 130.0, 102.0, 47.0, 64.0, 44.0, 49.0, 59.0, 114.0]
2025-05-09 21:23:37,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 52 minutes, 14 seconds)
2025-05-09 21:26:06,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:26:07,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 187.95378 ± 130.742
2025-05-09 21:26:07,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [130.78392, 402.0328, 78.44523, 402.36456, 342.4493, 66.24935, 112.14253, 75.649956, 110.615425, 158.80475]
2025-05-09 21:26:07,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 140.0, 45.0, 139.0, 121.0, 43.0, 58.0, 46.0, 59.0, 77.0]
2025-05-09 21:26:07,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 49 minutes, 47 seconds)
2025-05-09 21:28:35,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:28:36,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 292.20590 ± 101.824
2025-05-09 21:28:36,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [94.50837, 215.19942, 371.36487, 275.48602, 373.35504, 388.19937, 151.05826, 392.58643, 288.65497, 371.64633]
2025-05-09 21:28:36,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 91.0, 132.0, 105.0, 128.0, 132.0, 75.0, 136.0, 109.0, 160.0]
2025-05-09 21:28:37,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 47 minutes, 15 seconds)
2025-05-09 21:31:06,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:31:07,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 186.62866 ± 103.843
2025-05-09 21:31:07,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [227.22223, 158.03154, 11.102615, 121.29178, 315.20496, 133.99452, 369.6689, 136.79034, 282.53088, 110.44882]
2025-05-09 21:31:07,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 83.0, 12.0, 63.0, 148.0, 77.0, 131.0, 67.0, 106.0, 59.0]
2025-05-09 21:31:07,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 44 minutes, 51 seconds)
2025-05-09 21:33:35,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:33:36,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 183.86783 ± 122.187
2025-05-09 21:33:36,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [358.24384, 132.59802, 113.99504, 9.649417, 400.3802, 196.24117, 244.0502, 187.68057, 9.646511, 186.19348]
2025-05-09 21:33:36,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 68.0, 59.0, 11.0, 139.0, 87.0, 100.0, 83.0, 11.0, 86.0]
2025-05-09 21:33:36,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 42 minutes, 24 seconds)
2025-05-09 21:36:03,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:36:04,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 173.77817 ± 144.557
2025-05-09 21:36:04,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [223.67468, 49.176056, 25.210417, 352.28497, 390.86252, 205.79823, 20.258278, 100.53295, 357.73306, 12.250597]
2025-05-09 21:36:04,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 48.0, 24.0, 125.0, 136.0, 88.0, 17.0, 55.0, 126.0, 15.0]
2025-05-09 21:36:04,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 39 minutes, 50 seconds)
2025-05-09 21:38:32,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:38:33,832 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 197.37125 ± 144.687
2025-05-09 21:38:33,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [391.66522, 343.18985, 10.593251, 199.90567, 304.42334, 34.902985, 185.23041, 10.489562, 109.44777, 383.8644]
2025-05-09 21:38:33,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 120.0, 13.0, 90.0, 111.0, 36.0, 86.0, 12.0, 60.0, 135.0]
2025-05-09 21:38:33,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 37 minutes, 19 seconds)
2025-05-09 21:41:02,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:41:03,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 236.25290 ± 110.005
2025-05-09 21:41:03,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [107.970024, 156.95764, 203.43005, 227.61319, 399.42233, 400.98148, 116.524994, 140.11316, 376.84128, 232.67485]
2025-05-09 21:41:03,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 80.0, 88.0, 103.0, 150.0, 136.0, 62.0, 68.0, 134.0, 100.0]
2025-05-09 21:41:03,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 34 minutes, 49 seconds)
2025-05-09 21:43:31,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:43:32,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 157.02243 ± 131.014
2025-05-09 21:43:32,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [22.81388, 222.97414, 395.91873, 164.54236, 149.15187, 106.37494, 7.65851, 368.8292, 119.74385, 12.216791]
2025-05-09 21:43:32,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 96.0, 137.0, 74.0, 74.0, 59.0, 9.0, 140.0, 65.0, 13.0]
2025-05-09 21:43:32,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 32 minutes, 18 seconds)
2025-05-09 21:46:01,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:46:03,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 292.15558 ± 101.825
2025-05-09 21:46:03,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [369.12524, 389.08844, 368.77567, 80.36005, 303.50507, 210.2784, 395.7047, 378.88184, 213.56465, 212.27177]
2025-05-09 21:46:03,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 136.0, 132.0, 48.0, 116.0, 91.0, 141.0, 136.0, 92.0, 92.0]
2025-05-09 21:46:03,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 29 minutes, 52 seconds)
2025-05-09 21:48:31,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:48:32,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 189.02536 ± 143.629
2025-05-09 21:48:32,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [10.783699, 385.39572, 392.28772, 205.93813, 182.87961, 362.78745, 55.125, 9.593709, 73.024666, 212.43799]
2025-05-09 21:48:32,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 135.0, 134.0, 88.0, 84.0, 126.0, 38.0, 11.0, 42.0, 91.0]
2025-05-09 21:48:32,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 27 minutes, 25 seconds)
2025-05-09 21:50:59,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:51:00,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 194.73830 ± 120.662
2025-05-09 21:51:00,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [40.002773, 309.15714, 9.662929, 345.98593, 143.63924, 91.55621, 202.18042, 147.75246, 330.47134, 326.9746]
2025-05-09 21:51:00,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 113.0, 11.0, 125.0, 78.0, 53.0, 90.0, 70.0, 143.0, 117.0]
2025-05-09 21:51:00,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 24 minutes, 54 seconds)
2025-05-09 21:53:29,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:53:30,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 333.05542 ± 96.041
2025-05-09 21:53:30,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [274.6364, 376.1148, 383.6913, 355.66266, 385.7548, 66.84002, 392.5292, 399.52594, 317.46558, 378.33313]
2025-05-09 21:53:30,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 139.0, 134.0, 148.0, 135.0, 43.0, 135.0, 136.0, 118.0, 134.0]
2025-05-09 21:53:30,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (333.06) for latency MM1Queue_a033_s075
2025-05-09 21:53:30,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 21:53:30,759 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:53:30,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 22 minutes, 25 seconds)
2025-05-09 21:56:00,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:56:01,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 158.29283 ± 132.078
2025-05-09 21:56:01,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [230.66406, 70.86969, 138.37917, 8.398324, 120.314644, 382.83725, 10.295007, 62.399857, 164.46559, 394.30484]
2025-05-09 21:56:01,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 47.0, 69.0, 10.0, 64.0, 130.0, 13.0, 41.0, 78.0, 134.0]
2025-05-09 21:56:01,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 58 seconds)
2025-05-09 21:58:30,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:58:30,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 113.32127 ± 120.164
2025-05-09 21:58:30,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [9.77129, 21.900316, 13.430484, 264.7437, 357.01086, 12.329798, 136.30553, 217.34265, 8.344169, 92.03383]
2025-05-09 21:58:30,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 22.0, 14.0, 106.0, 124.0, 22.0, 67.0, 92.0, 10.0, 56.0]
2025-05-09 21:58:30,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 17 minutes, 26 seconds)
2025-05-09 22:00:58,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:00:59,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 195.70824 ± 133.565
2025-05-09 22:00:59,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [222.16545, 125.625404, 385.9925, 77.85366, 20.002205, 343.58817, 371.0332, 145.82092, 12.649256, 252.35158]
2025-05-09 22:00:59,424 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 65.0, 136.0, 52.0, 22.0, 126.0, 128.0, 72.0, 15.0, 102.0]
2025-05-09 22:00:59,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 56 seconds)
2025-05-09 22:03:28,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:03:29,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 139.93552 ± 110.063
2025-05-09 22:03:29,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [9.398353, 249.0831, 376.34607, 69.85338, 232.3484, 145.76974, 68.474335, 111.51181, 10.718662, 125.851326]
2025-05-09 22:03:29,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 100.0, 127.0, 44.0, 95.0, 71.0, 44.0, 59.0, 12.0, 64.0]
2025-05-09 22:03:29,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 28 seconds)
2025-05-09 22:05:57,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:05:58,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 201.43771 ± 134.474
2025-05-09 22:05:58,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [394.62027, 186.0705, 21.945992, 252.92624, 400.20792, 312.48984, 200.02942, 171.93123, 65.92915, 8.226474]
2025-05-09 22:05:58,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 83.0, 23.0, 102.0, 135.0, 127.0, 85.0, 81.0, 42.0, 10.0]
2025-05-09 22:05:58,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 58 seconds)
2025-05-09 22:08:27,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:08:29,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 264.43256 ± 110.649
2025-05-09 22:08:29,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [401.6743, 83.51406, 393.4654, 224.83531, 225.61563, 236.3983, 351.0566, 122.60569, 402.14667, 203.01376]
2025-05-09 22:08:29,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 48.0, 134.0, 95.0, 91.0, 97.0, 128.0, 64.0, 138.0, 86.0]
2025-05-09 22:08:29,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 28 seconds)
2025-05-09 22:10:56,768 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:10:57,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 123.67242 ± 111.135
2025-05-09 22:10:57,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [10.726507, 46.36679, 399.22235, 103.766655, 133.1991, 12.345448, 91.649445, 141.02933, 230.87233, 67.546295]
2025-05-09 22:10:57,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 41.0, 138.0, 55.0, 71.0, 13.0, 53.0, 69.0, 95.0, 43.0]
2025-05-09 22:10:57,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 58 seconds)
2025-05-09 22:13:26,975 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:13:28,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 235.49873 ± 167.043
2025-05-09 22:13:28,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [385.36197, 25.902203, 35.954105, 406.9866, 391.21536, 154.89702, 133.98587, 18.360064, 388.33, 413.99435]
2025-05-09 22:13:28,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 26.0, 34.0, 146.0, 137.0, 79.0, 67.0, 18.0, 140.0, 146.0]
2025-05-09 22:13:28,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 29 seconds)
2025-05-09 22:15:56,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:15:57,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 211.19485 ± 121.720
2025-05-09 22:15:57,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [306.5148, 102.5904, 337.4641, 377.66528, 112.7274, 102.709045, 13.2127495, 360.6149, 212.36781, 186.08194]
2025-05-09 22:15:57,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 56.0, 123.0, 138.0, 58.0, 74.0, 16.0, 127.0, 90.0, 85.0]
2025-05-09 22:15:57,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1251 [DEBUG]: Training session finished
