2025-05-11 16:00:25,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 16:00:25,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 16:00:25,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7b914ea3df70>}
2025-05-11 16:00:25,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1111 [DEBUG]: using device: cpu
2025-05-11 16:00:25,249 baseline-sac-noisy-hopper:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 24
2025-05-11 16:00:25,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-11 16:00:25,268 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-11 16:00:25,269 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=26, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 16:00:25,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-11 16:00:25,465 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-11 16:02:56,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:02:57,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 118.78817 ± 61.533
2025-05-11 16:02:57,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [168.97072, 186.34471, 21.404808, 205.71617, 116.34436, 180.97939, 76.33437, 36.018547, 84.83629, 110.932335]
2025-05-11 16:02:57,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 101.0, 22.0, 112.0, 72.0, 95.0, 45.0, 28.0, 53.0, 72.0]
2025-05-11 16:02:57,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (118.79) for latency MM1Queue_a033_s075
2025-05-11 16:02:57,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:02:57,085 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:02:57,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 10 minutes, 10 seconds)
2025-05-11 16:05:35,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:05:35,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 13.65074 ± 4.216
2025-05-11 16:05:35,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [11.085293, 21.62021, 22.153484, 10.038011, 12.512314, 12.535147, 10.231616, 11.4683, 12.834942, 12.028041]
2025-05-11 16:05:35,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 50.0, 48.0, 39.0, 43.0, 43.0, 38.0, 40.0, 43.0, 40.0]
2025-05-11 16:05:35,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 13 minutes, 17 seconds)
2025-05-11 16:08:16,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:08:19,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 272.11462 ± 149.251
2025-05-11 16:08:19,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [187.08266, 35.576294, 314.2311, 191.04816, 521.2303, 222.05649, 80.4306, 316.93887, 440.1278, 412.42416]
2025-05-11 16:08:19,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 31.0, 277.0, 157.0, 334.0, 181.0, 65.0, 287.0, 344.0, 321.0]
2025-05-11 16:08:19,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (272.11) for latency MM1Queue_a033_s075
2025-05-11 16:08:19,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:08:19,208 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:08:19,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 15 minutes, 17 seconds)
2025-05-11 16:10:59,896 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:11:01,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 163.28775 ± 45.852
2025-05-11 16:11:01,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [189.73398, 180.90434, 210.93716, 93.40019, 111.18766, 173.3543, 224.70198, 167.87003, 89.38777, 191.40015]
2025-05-11 16:11:01,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 99.0, 128.0, 57.0, 82.0, 99.0, 118.0, 91.0, 58.0, 112.0]
2025-05-11 16:11:01,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 14 minutes, 14 seconds)
2025-05-11 16:13:43,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:13:44,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 215.67477 ± 143.177
2025-05-11 16:13:44,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [143.4582, 85.515076, 371.6914, 72.88138, 176.23318, 205.15878, 476.54318, 175.99504, 42.077793, 407.19373]
2025-05-11 16:13:44,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 58.0, 194.0, 44.0, 130.0, 103.0, 299.0, 133.0, 37.0, 230.0]
2025-05-11 16:13:44,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 13 minutes, 7 seconds)
2025-05-11 16:16:26,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:16:28,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 246.62857 ± 109.924
2025-05-11 16:16:28,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [161.76631, 98.77578, 192.21892, 174.9324, 298.38416, 357.57968, 144.41135, 458.50974, 359.1469, 220.56032]
2025-05-11 16:16:28,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 67.0, 96.0, 145.0, 141.0, 182.0, 97.0, 282.0, 178.0, 128.0]
2025-05-11 16:16:28,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 14 minutes, 15 seconds)
2025-05-11 16:19:11,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:19:14,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 274.07452 ± 293.710
2025-05-11 16:19:14,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [56.90834, 49.97657, 1043.8191, 381.49908, 237.61684, 500.48526, 198.00378, 70.006805, 128.11467, 74.3148]
2025-05-11 16:19:14,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 35.0, 1000.0, 205.0, 202.0, 322.0, 199.0, 49.0, 96.0, 62.0]
2025-05-11 16:19:14,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (274.07) for latency MM1Queue_a033_s075
2025-05-11 16:19:14,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:19:14,957 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:19:14,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 13 minutes, 59 seconds)
2025-05-11 16:21:50,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:21:51,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 168.39713 ± 121.015
2025-05-11 16:21:51,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [327.48633, 139.67574, 43.792908, 123.496254, 201.05513, 88.7221, 335.48, 348.75418, 57.42317, 18.085274]
2025-05-11 16:21:51,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 76.0, 32.0, 82.0, 148.0, 60.0, 143.0, 147.0, 40.0, 20.0]
2025-05-11 16:21:51,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 9 minutes, 7 seconds)
2025-05-11 16:24:29,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:24:31,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 278.73853 ± 143.082
2025-05-11 16:24:31,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [259.97955, 413.717, 424.5625, 121.9001, 118.67203, 38.3877, 340.23096, 215.42531, 477.23325, 377.27676]
2025-05-11 16:24:31,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 238.0, 370.0, 75.0, 87.0, 36.0, 291.0, 184.0, 315.0, 240.0]
2025-05-11 16:24:31,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (278.74) for latency MM1Queue_a033_s075
2025-05-11 16:24:31,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:24:31,857 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:24:31,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 5 minutes, 56 seconds)
2025-05-11 16:27:07,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:27:09,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 303.13458 ± 129.377
2025-05-11 16:27:09,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [413.00034, 403.03696, 364.90564, 419.6807, 85.19762, 222.41692, 402.17783, 404.87253, 234.35368, 81.703575]
2025-05-11 16:27:09,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 167.0, 154.0, 174.0, 53.0, 107.0, 162.0, 151.0, 110.0, 51.0]
2025-05-11 16:27:09,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (303.13) for latency MM1Queue_a033_s075
2025-05-11 16:27:09,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:27:09,232 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:27:09,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 1 minute, 19 seconds)
2025-05-11 16:29:44,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:29:45,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 306.54999 ± 127.755
2025-05-11 16:29:45,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [402.2703, 154.28693, 406.53387, 396.0961, 408.09125, 191.80519, 441.97473, 319.1755, 304.93723, 40.328724]
2025-05-11 16:29:45,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 84.0, 159.0, 147.0, 154.0, 106.0, 181.0, 144.0, 139.0, 32.0]
2025-05-11 16:29:45,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (306.55) for latency MM1Queue_a033_s075
2025-05-11 16:29:45,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:29:45,783 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:29:45,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 56 minutes, 30 seconds)
2025-05-11 16:32:25,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:32:27,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 279.54315 ± 164.130
2025-05-11 16:32:27,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [29.0253, 395.0019, 450.8327, 259.38452, 42.80838, 412.7498, 313.7237, 64.64717, 356.81323, 470.44516]
2025-05-11 16:32:27,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 198.0, 196.0, 113.0, 31.0, 184.0, 178.0, 43.0, 168.0, 223.0]
2025-05-11 16:32:27,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 52 minutes, 29 seconds)
2025-05-11 16:35:06,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:35:08,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 369.67755 ± 115.774
2025-05-11 16:35:08,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [425.29498, 440.88745, 413.3843, 32.22454, 446.01367, 402.05786, 361.74112, 359.74365, 416.58325, 398.84464]
2025-05-11 16:35:08,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [173.0, 166.0, 163.0, 25.0, 172.0, 152.0, 143.0, 143.0, 168.0, 155.0]
2025-05-11 16:35:08,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (369.68) for latency MM1Queue_a033_s075
2025-05-11 16:35:08,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:35:08,673 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:35:08,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 51 minutes, 9 seconds)
2025-05-11 16:37:48,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:37:50,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 375.24707 ± 94.611
2025-05-11 16:37:50,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [416.47058, 427.6697, 418.2293, 403.23987, 413.26807, 335.74658, 422.90106, 399.0469, 101.30637, 414.59216]
2025-05-11 16:37:50,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 166.0, 151.0, 159.0, 153.0, 146.0, 158.0, 160.0, 63.0, 163.0]
2025-05-11 16:37:50,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (375.25) for latency MM1Queue_a033_s075
2025-05-11 16:37:50,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:37:50,797 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:37:50,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 49 minutes, 1 second)
2025-05-11 16:40:29,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:40:31,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 342.29279 ± 124.402
2025-05-11 16:40:31,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [405.90036, 423.21194, 253.77711, 422.97537, 289.24854, 419.35056, 16.989697, 443.67526, 412.43408, 335.36502]
2025-05-11 16:40:31,555 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 159.0, 116.0, 166.0, 126.0, 165.0, 15.0, 170.0, 155.0, 146.0]
2025-05-11 16:40:31,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 47 minutes, 19 seconds)
2025-05-11 16:43:39,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:43:42,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 377.11469 ± 93.504
2025-05-11 16:43:42,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [373.851, 410.3353, 418.72025, 410.78897, 99.142105, 406.23544, 404.57834, 416.29587, 409.21042, 421.98895]
2025-05-11 16:43:42,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 153.0, 158.0, 156.0, 62.0, 162.0, 151.0, 158.0, 151.0, 166.0]
2025-05-11 16:43:42,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (377.11) for latency MM1Queue_a033_s075
2025-05-11 16:43:42,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:43:42,171 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:43:42,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 54 minutes, 11 seconds)
2025-05-11 16:47:03,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:47:05,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 342.48044 ± 128.974
2025-05-11 16:47:05,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [419.22424, 395.30853, 396.2881, 127.918755, 408.03796, 408.5887, 401.19507, 417.95605, 47.124825, 403.16217]
2025-05-11 16:47:05,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 149.0, 154.0, 83.0, 153.0, 153.0, 151.0, 167.0, 29.0, 155.0]
2025-05-11 16:47:05,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 2 minutes, 54 seconds)
2025-05-11 16:50:25,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:50:27,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 413.30460 ± 19.369
2025-05-11 16:50:27,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [369.79068, 423.28772, 421.15305, 390.40295, 407.52878, 431.03122, 427.95047, 437.10214, 407.79852, 417.00027]
2025-05-11 16:50:27,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 161.0, 162.0, 147.0, 157.0, 170.0, 166.0, 162.0, 153.0, 158.0]
2025-05-11 16:50:27,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (413.30) for latency MM1Queue_a033_s075
2025-05-11 16:50:27,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 16:50:27,990 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 16:50:28,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 11 minutes, 16 seconds)
2025-05-11 16:53:22,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:53:23,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 383.55887 ± 68.150
2025-05-11 16:53:23,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [197.21802, 354.3431, 440.7126, 367.78128, 402.76474, 431.4704, 415.6133, 378.09616, 437.81512, 409.774]
2025-05-11 16:53:23,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 143.0, 167.0, 141.0, 152.0, 165.0, 160.0, 142.0, 169.0, 154.0]
2025-05-11 16:53:23,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 11 minutes, 56 seconds)
2025-05-11 16:56:04,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:56:05,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 340.23529 ± 118.985
2025-05-11 16:56:05,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [416.1898, 274.20605, 350.19693, 410.3089, 416.33633, 416.75827, 367.02246, 313.19305, 423.98218, 14.158928]
2025-05-11 16:56:05,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 122.0, 145.0, 155.0, 154.0, 160.0, 170.0, 131.0, 163.0, 16.0]
2025-05-11 16:56:05,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 9 minutes, 6 seconds)
2025-05-11 16:58:46,026 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 16:58:48,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 412.92880 ± 36.857
2025-05-11 16:58:48,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [447.41315, 312.7289, 437.02493, 425.42627, 424.7054, 433.0831, 430.5208, 403.91336, 389.66238, 424.80966]
2025-05-11 16:58:48,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 156.0, 157.0, 160.0, 162.0, 169.0, 164.0, 147.0, 153.0, 161.0]
2025-05-11 16:58:48,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 58 minutes, 34 seconds)
2025-05-11 17:01:33,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:01:35,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 417.49863 ± 9.173
2025-05-11 17:01:35,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [422.64877, 430.21332, 420.14996, 413.7524, 406.39893, 418.98465, 399.33212, 422.79193, 428.49567, 412.21857]
2025-05-11 17:01:35,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 165.0, 160.0, 164.0, 158.0, 159.0, 146.0, 158.0, 163.0, 156.0]
2025-05-11 17:01:35,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (417.50) for latency MM1Queue_a033_s075
2025-05-11 17:01:35,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 17:01:35,343 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 17:01:35,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 46 minutes, 8 seconds)
2025-05-11 17:04:24,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:04:26,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 368.73236 ± 102.426
2025-05-11 17:04:26,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [433.375, 327.9354, 72.526115, 392.14316, 405.22324, 401.96573, 412.77008, 399.9541, 424.42944, 417.00113]
2025-05-11 17:04:26,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 131.0, 49.0, 143.0, 150.0, 152.0, 158.0, 152.0, 157.0, 154.0]
2025-05-11 17:04:26,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 35 minutes, 9 seconds)
2025-05-11 17:07:12,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:07:14,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 391.15317 ± 56.756
2025-05-11 17:07:14,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [408.11087, 442.24725, 246.04878, 325.70755, 411.48553, 422.41095, 395.1481, 419.95758, 425.29053, 415.12457]
2025-05-11 17:07:14,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 173.0, 110.0, 133.0, 151.0, 162.0, 148.0, 159.0, 166.0, 154.0]
2025-05-11 17:07:14,769 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 30 minutes, 29 seconds)
2025-05-11 17:10:01,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:10:03,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 372.63199 ± 63.684
2025-05-11 17:10:03,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [218.8928, 436.6492, 387.52185, 412.17633, 418.1405, 307.45053, 374.76627, 408.0023, 422.03976, 340.68015]
2025-05-11 17:10:03,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 168.0, 147.0, 154.0, 157.0, 145.0, 143.0, 154.0, 169.0, 155.0]
2025-05-11 17:10:03,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 29 minutes, 31 seconds)
2025-05-11 17:12:51,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:12:53,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 505.22290 ± 104.171
2025-05-11 17:12:53,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [392.93143, 727.8329, 416.71915, 563.48456, 484.66364, 578.20715, 419.4322, 608.7359, 433.02707, 427.1953]
2025-05-11 17:12:53,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 250.0, 158.0, 207.0, 181.0, 204.0, 159.0, 223.0, 172.0, 163.0]
2025-05-11 17:12:53,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (505.22) for latency MM1Queue_a033_s075
2025-05-11 17:12:53,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 17:12:53,843 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 17:12:53,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 28 minutes, 36 seconds)
2025-05-11 17:15:41,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:15:43,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 373.20703 ± 120.400
2025-05-11 17:15:43,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [138.41545, 575.0287, 419.8603, 414.99054, 415.63086, 412.84247, 181.47495, 396.3914, 433.60916, 343.8267]
2025-05-11 17:15:43,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 204.0, 158.0, 151.0, 154.0, 155.0, 98.0, 155.0, 168.0, 133.0]
2025-05-11 17:15:43,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 26 minutes, 17 seconds)
2025-05-11 17:18:30,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:18:34,022 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 637.71136 ± 177.625
2025-05-11 17:18:34,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [686.5851, 729.0687, 721.6009, 784.6118, 789.36237, 541.07544, 289.4362, 346.9039, 831.5849, 656.88416]
2025-05-11 17:18:34,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [266.0, 262.0, 250.0, 245.0, 250.0, 202.0, 125.0, 150.0, 295.0, 230.0]
2025-05-11 17:18:34,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (637.71) for latency MM1Queue_a033_s075
2025-05-11 17:18:34,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 17:18:34,027 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 17:18:34,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 23 minutes, 27 seconds)
2025-05-11 17:21:21,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:21:24,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 589.85669 ± 130.908
2025-05-11 17:21:24,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [763.94476, 684.0943, 734.0153, 679.0249, 648.5831, 625.2682, 401.4265, 419.29175, 526.7449, 416.17355]
2025-05-11 17:21:24,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [268.0, 247.0, 239.0, 243.0, 220.0, 211.0, 153.0, 158.0, 190.0, 157.0]
2025-05-11 17:21:24,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 21 minutes, 7 seconds)
2025-05-11 17:24:11,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:24:13,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 513.19940 ± 151.584
2025-05-11 17:24:13,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [652.0356, 281.549, 422.1027, 393.6917, 563.8209, 700.75476, 665.50714, 447.00946, 688.12915, 317.3937]
2025-05-11 17:24:13,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [219.0, 126.0, 156.0, 147.0, 193.0, 235.0, 231.0, 166.0, 225.0, 142.0]
2025-05-11 17:24:13,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 18 minutes, 18 seconds)
2025-05-11 17:27:02,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:27:05,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 576.11115 ± 135.587
2025-05-11 17:27:05,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [627.35596, 619.99384, 331.28262, 690.4787, 693.75336, 672.4799, 339.03326, 676.6227, 465.77203, 644.33887]
2025-05-11 17:27:05,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [200.0, 230.0, 144.0, 223.0, 225.0, 221.0, 141.0, 229.0, 171.0, 228.0]
2025-05-11 17:27:05,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 15 minutes, 48 seconds)
2025-05-11 17:29:52,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:29:55,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 632.51764 ± 288.336
2025-05-11 17:29:55,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [627.0441, 656.7269, 649.9648, 319.03424, 728.62836, 1230.6687, 822.6237, 67.77278, 679.1448, 543.5678]
2025-05-11 17:29:55,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [248.0, 240.0, 203.0, 127.0, 264.0, 472.0, 282.0, 44.0, 262.0, 219.0]
2025-05-11 17:29:55,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 13 minutes, 10 seconds)
2025-05-11 17:32:46,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:32:49,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 561.12854 ± 295.872
2025-05-11 17:32:49,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [1101.2743, 403.60236, 479.22794, 651.0814, 417.83972, 256.8072, 1136.596, 424.1668, 306.42142, 434.2681]
2025-05-11 17:32:49,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [442.0, 175.0, 223.0, 250.0, 173.0, 118.0, 479.0, 185.0, 143.0, 180.0]
2025-05-11 17:32:49,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 11 minutes, 7 seconds)
2025-05-11 17:35:34,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:35:37,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 599.82220 ± 216.204
2025-05-11 17:35:37,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [380.3453, 691.43677, 808.87244, 804.88055, 289.39752, 705.4879, 615.41144, 800.5088, 708.5728, 193.30846]
2025-05-11 17:35:37,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [161.0, 237.0, 287.0, 278.0, 140.0, 251.0, 240.0, 254.0, 242.0, 106.0]
2025-05-11 17:35:37,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 7 minutes, 33 seconds)
2025-05-11 17:38:25,598 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:38:28,596 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 609.94897 ± 142.304
2025-05-11 17:38:28,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [264.2928, 747.32965, 682.7449, 646.63824, 705.884, 631.6112, 664.9753, 416.19397, 663.83014, 675.98987]
2025-05-11 17:38:28,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 266.0, 247.0, 249.0, 256.0, 238.0, 241.0, 167.0, 255.0, 225.0]
2025-05-11 17:38:28,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 5 minutes, 13 seconds)
2025-05-11 17:41:16,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:41:18,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 595.58594 ± 156.075
2025-05-11 17:41:18,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [647.1899, 717.8839, 546.26105, 705.65234, 558.1393, 719.145, 539.8207, 569.6577, 761.00995, 191.0994]
2025-05-11 17:41:18,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 244.0, 205.0, 234.0, 203.0, 236.0, 203.0, 199.0, 247.0, 112.0]
2025-05-11 17:41:18,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 2 minutes, 5 seconds)
2025-05-11 17:44:05,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:44:08,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 588.43176 ± 208.753
2025-05-11 17:44:08,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [613.4818, 625.8662, 295.22495, 623.59625, 1047.7744, 621.9987, 665.30035, 541.15936, 234.13036, 615.7847]
2025-05-11 17:44:08,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [253.0, 271.0, 137.0, 249.0, 428.0, 254.0, 225.0, 212.0, 115.0, 254.0]
2025-05-11 17:44:08,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 59 minutes, 11 seconds)
2025-05-11 17:46:55,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:46:57,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 616.72327 ± 100.074
2025-05-11 17:46:57,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [652.08325, 674.8888, 492.6993, 387.47275, 678.20966, 544.31085, 690.28204, 672.858, 692.3329, 682.0948]
2025-05-11 17:46:57,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [238.0, 245.0, 196.0, 164.0, 238.0, 203.0, 244.0, 232.0, 238.0, 239.0]
2025-05-11 17:46:57,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 55 minutes, 15 seconds)
2025-05-11 17:49:35,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:49:37,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 648.85034 ± 108.870
2025-05-11 17:49:37,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [543.40674, 615.6556, 705.68616, 709.9111, 860.4301, 417.6252, 679.9784, 644.32697, 651.7674, 659.71515]
2025-05-11 17:49:37,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 208.0, 245.0, 249.0, 283.0, 162.0, 219.0, 220.0, 220.0, 228.0]
2025-05-11 17:49:37,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (648.85) for latency MM1Queue_a033_s075
2025-05-11 17:49:37,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 17:49:37,811 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 17:49:37,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 50 minutes, 56 seconds)
2025-05-11 17:52:17,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:52:19,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 499.45215 ± 187.232
2025-05-11 17:52:19,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [222.33382, 695.0024, 622.3396, 654.07074, 256.6733, 358.38785, 616.7881, 603.63086, 697.6933, 267.60144]
2025-05-11 17:52:19,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 237.0, 192.0, 204.0, 114.0, 150.0, 196.0, 199.0, 229.0, 108.0]
2025-05-11 17:52:19,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 46 minutes, 9 seconds)
2025-05-11 17:54:59,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:55:02,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 587.72174 ± 129.813
2025-05-11 17:55:02,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [716.4816, 649.79596, 412.1195, 659.2725, 678.95026, 672.9209, 646.92896, 668.6915, 335.4252, 436.63095]
2025-05-11 17:55:02,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 229.0, 160.0, 222.0, 225.0, 220.0, 218.0, 228.0, 145.0, 165.0]
2025-05-11 17:55:02,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 41 minutes, 54 seconds)
2025-05-11 17:57:44,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 17:57:46,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 501.57065 ± 212.307
2025-05-11 17:57:46,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [683.1949, 419.76154, 265.61243, 638.9727, 677.282, 93.36815, 255.0065, 692.2304, 662.94244, 627.3355]
2025-05-11 17:57:46,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 165.0, 110.0, 205.0, 220.0, 53.0, 113.0, 232.0, 234.0, 207.0]
2025-05-11 17:57:46,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 38 minutes, 6 seconds)
2025-05-11 18:00:24,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:00:27,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 602.21112 ± 141.597
2025-05-11 18:00:27,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [698.619, 684.2838, 497.16104, 564.9711, 315.16504, 724.35657, 645.38104, 514.6852, 847.21857, 530.26965]
2025-05-11 18:00:27,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 229.0, 177.0, 201.0, 137.0, 257.0, 228.0, 179.0, 279.0, 188.0]
2025-05-11 18:00:27,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 33 minutes, 48 seconds)
2025-05-11 18:03:07,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:03:10,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 649.32184 ± 169.381
2025-05-11 18:03:10,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [697.061, 675.48267, 463.87335, 301.94363, 643.70575, 699.37616, 642.9783, 998.5648, 714.8059, 655.42645]
2025-05-11 18:03:10,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 239.0, 173.0, 129.0, 222.0, 222.0, 224.0, 361.0, 234.0, 234.0]
2025-05-11 18:03:10,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (649.32) for latency MM1Queue_a033_s075
2025-05-11 18:03:10,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 18:03:10,762 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 18:03:10,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 31 minutes, 45 seconds)
2025-05-11 18:05:50,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:05:53,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 609.58002 ± 120.409
2025-05-11 18:05:53,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [686.6154, 697.1556, 623.41187, 692.6867, 525.2654, 638.6001, 279.03323, 665.3132, 615.4634, 672.25555]
2025-05-11 18:05:53,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [228.0, 231.0, 216.0, 223.0, 197.0, 236.0, 119.0, 203.0, 235.0, 229.0]
2025-05-11 18:05:53,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 29 minutes, 10 seconds)
2025-05-11 18:08:32,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:08:35,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 653.75739 ± 175.966
2025-05-11 18:08:35,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [262.14694, 782.80023, 726.3064, 489.2621, 666.77435, 980.3133, 646.5654, 651.47723, 674.48395, 657.44366]
2025-05-11 18:08:35,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 285.0, 226.0, 183.0, 235.0, 356.0, 232.0, 232.0, 233.0, 227.0]
2025-05-11 18:08:35,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (653.76) for latency MM1Queue_a033_s075
2025-05-11 18:08:35,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 18:08:35,145 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 18:08:35,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 26 minutes, 21 seconds)
2025-05-11 18:11:15,597 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:11:18,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 555.72913 ± 168.822
2025-05-11 18:11:18,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [678.80493, 715.4013, 671.33044, 140.33757, 445.6618, 636.31744, 400.0169, 649.81555, 588.6323, 630.9725]
2025-05-11 18:11:18,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 235.0, 220.0, 74.0, 165.0, 215.0, 150.0, 215.0, 249.0, 216.0]
2025-05-11 18:11:18,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 23 minutes, 21 seconds)
2025-05-11 18:13:55,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:13:58,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 581.76764 ± 169.678
2025-05-11 18:13:58,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [711.77026, 698.26215, 712.7951, 348.01913, 698.7012, 696.6775, 313.1546, 687.37946, 312.91568, 638.00165]
2025-05-11 18:13:58,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [241.0, 234.0, 236.0, 140.0, 239.0, 231.0, 131.0, 231.0, 136.0, 239.0]
2025-05-11 18:13:58,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 20 minutes, 33 seconds)
2025-05-11 18:16:40,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:16:42,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 499.18878 ± 224.317
2025-05-11 18:16:42,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [702.1772, 727.51385, 681.8458, 373.4826, 663.8924, 742.14264, 261.06378, 199.64903, 498.83838, 141.28166]
2025-05-11 18:16:42,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 239.0, 224.0, 149.0, 228.0, 256.0, 114.0, 98.0, 189.0, 75.0]
2025-05-11 18:16:42,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 18 minutes, 4 seconds)
2025-05-11 18:19:19,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:19:22,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 624.66364 ± 143.173
2025-05-11 18:19:22,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [707.5023, 717.2451, 684.73, 741.851, 603.8513, 462.54938, 680.2529, 667.9688, 718.86835, 261.8171]
2025-05-11 18:19:22,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [241.0, 244.0, 227.0, 243.0, 234.0, 181.0, 242.0, 230.0, 248.0, 119.0]
2025-05-11 18:19:22,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 14 minutes, 54 seconds)
2025-05-11 18:22:06,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:22:09,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 635.70935 ± 96.027
2025-05-11 18:22:09,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [680.16705, 692.30176, 434.97083, 470.9991, 685.9619, 712.99615, 609.76373, 712.9687, 658.3839, 698.581]
2025-05-11 18:22:09,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [240.0, 235.0, 170.0, 186.0, 233.0, 250.0, 234.0, 242.0, 230.0, 223.0]
2025-05-11 18:22:09,646 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 13 minutes, 1 second)
2025-05-11 18:24:54,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:24:56,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 521.56116 ± 246.816
2025-05-11 18:24:56,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [714.1971, 297.12546, 737.0368, 604.04144, 278.20166, 117.50157, 712.9345, 697.8291, 229.88295, 826.86127]
2025-05-11 18:24:56,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 128.0, 249.0, 214.0, 121.0, 74.0, 244.0, 228.0, 105.0, 264.0]
2025-05-11 18:24:56,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 10 minutes, 58 seconds)
2025-05-11 18:27:42,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:27:45,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 647.21484 ± 124.465
2025-05-11 18:27:45,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [831.46814, 503.4004, 508.07147, 705.96094, 727.0799, 665.1283, 686.6552, 406.3641, 697.8062, 740.2136]
2025-05-11 18:27:45,671 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [302.0, 178.0, 183.0, 231.0, 251.0, 214.0, 228.0, 158.0, 233.0, 246.0]
2025-05-11 18:27:45,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 9 minutes, 37 seconds)
2025-05-11 18:30:35,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:30:38,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 524.89221 ± 182.460
2025-05-11 18:30:38,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [265.51538, 774.0779, 451.75186, 402.369, 722.38525, 648.0327, 692.4509, 646.9586, 368.8976, 276.48276]
2025-05-11 18:30:38,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 268.0, 174.0, 159.0, 256.0, 209.0, 240.0, 217.0, 142.0, 120.0]
2025-05-11 18:30:38,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 8 minutes, 4 seconds)
2025-05-11 18:33:23,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:33:26,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 632.41882 ± 121.120
2025-05-11 18:33:26,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [713.69415, 713.5916, 359.34274, 720.94354, 692.50397, 537.15265, 709.7232, 479.06485, 703.0443, 695.1272]
2025-05-11 18:33:26,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [239.0, 234.0, 152.0, 241.0, 233.0, 193.0, 233.0, 178.0, 231.0, 221.0]
2025-05-11 18:33:26,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-05-11 18:36:17,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:36:20,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 629.36963 ± 163.714
2025-05-11 18:36:20,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [708.15625, 263.19507, 431.6341, 649.7351, 798.1017, 619.9275, 785.04816, 784.8657, 701.43384, 551.59937]
2025-05-11 18:36:20,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [241.0, 118.0, 172.0, 233.0, 251.0, 201.0, 247.0, 252.0, 241.0, 195.0]
2025-05-11 18:36:20,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 4 minutes, 48 seconds)
2025-05-11 18:39:27,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:39:31,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 731.50690 ± 75.375
2025-05-11 18:39:31,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [704.0945, 679.6333, 857.3645, 739.43274, 699.8958, 702.0642, 699.2881, 704.3746, 638.65186, 890.26935]
2025-05-11 18:39:31,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 235.0, 266.0, 237.0, 237.0, 233.0, 243.0, 234.0, 230.0, 301.0]
2025-05-11 18:39:31,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (731.51) for latency MM1Queue_a033_s075
2025-05-11 18:39:31,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 18:39:31,536 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 18:39:31,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 5 minutes, 24 seconds)
2025-05-11 18:42:52,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:42:55,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 598.68884 ± 187.238
2025-05-11 18:42:55,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [721.1641, 161.51039, 631.81335, 698.04224, 730.4775, 677.743, 304.99832, 679.6922, 683.1694, 698.278]
2025-05-11 18:42:55,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 89.0, 222.0, 241.0, 239.0, 237.0, 131.0, 226.0, 239.0, 246.0]
2025-05-11 18:42:55,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 7 minutes, 22 seconds)
2025-05-11 18:46:15,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:46:18,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 617.18494 ± 146.814
2025-05-11 18:46:18,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [686.133, 670.85364, 710.6976, 697.73975, 690.82983, 359.89133, 701.81586, 675.3787, 292.06104, 686.4483]
2025-05-11 18:46:18,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 232.0, 245.0, 239.0, 226.0, 147.0, 235.0, 232.0, 132.0, 235.0]
2025-05-11 18:46:18,731 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 8 minutes, 32 seconds)
2025-05-11 18:49:00,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:49:02,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 609.54559 ± 163.000
2025-05-11 18:49:02,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [720.8122, 656.00244, 665.9481, 251.08427, 856.0641, 547.6963, 502.72186, 731.54517, 698.0697, 465.5115]
2025-05-11 18:49:02,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 238.0, 244.0, 114.0, 273.0, 197.0, 194.0, 240.0, 246.0, 178.0]
2025-05-11 18:49:02,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 4 minutes, 52 seconds)
2025-05-11 18:51:39,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:51:42,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 623.29022 ± 152.748
2025-05-11 18:51:42,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [707.96735, 704.3739, 301.8534, 690.0577, 692.5166, 771.25977, 714.00946, 654.818, 348.85748, 647.18866]
2025-05-11 18:51:42,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 231.0, 129.0, 225.0, 225.0, 245.0, 229.0, 214.0, 150.0, 219.0]
2025-05-11 18:51:42,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 59 minutes, 48 seconds)
2025-05-11 18:54:18,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:54:20,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 513.28992 ± 169.535
2025-05-11 18:54:20,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [714.2238, 683.8645, 289.5343, 504.09903, 454.0369, 293.2184, 732.4862, 682.0894, 455.83466, 323.51205]
2025-05-11 18:54:20,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [233.0, 230.0, 124.0, 188.0, 172.0, 126.0, 237.0, 229.0, 174.0, 138.0]
2025-05-11 18:54:20,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 52 minutes, 37 seconds)
2025-05-11 18:56:59,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:57:02,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 644.12201 ± 97.542
2025-05-11 18:57:02,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [725.2446, 693.86127, 701.38806, 469.62094, 699.29913, 713.9982, 598.9039, 453.17447, 668.6337, 717.09576]
2025-05-11 18:57:02,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 229.0, 245.0, 182.0, 228.0, 238.0, 226.0, 181.0, 246.0, 245.0]
2025-05-11 18:57:02,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 44 minutes, 26 seconds)
2025-05-11 18:59:44,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 18:59:47,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 688.29315 ± 206.925
2025-05-11 18:59:47,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [701.4777, 727.5089, 698.25586, 882.2495, 953.72675, 709.56067, 690.7581, 669.9333, 125.925446, 723.5356]
2025-05-11 18:59:47,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [237.0, 247.0, 235.0, 289.0, 318.0, 233.0, 237.0, 228.0, 77.0, 244.0]
2025-05-11 18:59:47,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 37 minutes, 6 seconds)
2025-05-11 19:02:25,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:02:28,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 655.78583 ± 130.623
2025-05-11 19:02:28,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [701.28064, 709.2373, 715.82294, 678.6069, 705.4065, 711.7733, 276.90256, 609.7701, 745.1761, 703.8819]
2025-05-11 19:02:28,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 239.0, 241.0, 230.0, 240.0, 238.0, 124.0, 201.0, 245.0, 235.0]
2025-05-11 19:02:28,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 34 minutes, 1 second)
2025-05-11 19:05:05,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:05:08,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 585.37317 ± 176.049
2025-05-11 19:05:08,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [765.97107, 719.12244, 538.4198, 695.12915, 724.1237, 281.5295, 722.54193, 690.42096, 375.1433, 341.32956]
2025-05-11 19:05:08,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 249.0, 202.0, 229.0, 237.0, 123.0, 224.0, 227.0, 154.0, 149.0]
2025-05-11 19:05:08,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 31 minutes, 22 seconds)
2025-05-11 19:07:45,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:07:48,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 650.69910 ± 112.724
2025-05-11 19:07:48,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [672.3127, 684.7434, 681.78625, 532.08936, 685.7123, 685.55414, 658.8733, 383.25043, 684.23004, 838.43933]
2025-05-11 19:07:48,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 238.0, 234.0, 180.0, 243.0, 236.0, 233.0, 152.0, 235.0, 272.0]
2025-05-11 19:07:48,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 28 minutes, 52 seconds)
2025-05-11 19:10:25,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:10:27,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 485.12711 ± 211.837
2025-05-11 19:10:27,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [671.81165, 451.71234, 510.43335, 686.48346, 255.12224, 722.84735, 257.61636, 304.1515, 793.41473, 197.67809]
2025-05-11 19:10:27,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [237.0, 176.0, 189.0, 238.0, 112.0, 230.0, 111.0, 129.0, 246.0, 99.0]
2025-05-11 19:10:27,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 25 minutes, 54 seconds)
2025-05-11 19:13:01,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:13:04,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 645.57489 ± 133.894
2025-05-11 19:13:04,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [808.42474, 813.7914, 697.64996, 545.0983, 679.596, 333.07886, 580.3438, 596.2852, 706.6433, 694.83704]
2025-05-11 19:13:04,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [277.0, 257.0, 231.0, 196.0, 226.0, 138.0, 218.0, 214.0, 236.0, 230.0]
2025-05-11 19:13:04,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 22 minutes, 16 seconds)
2025-05-11 19:15:39,636 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:15:42,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 624.17786 ± 175.318
2025-05-11 19:15:42,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [798.85065, 733.32635, 540.31964, 774.52277, 709.8723, 755.2192, 410.75803, 598.4142, 224.84113, 695.65356]
2025-05-11 19:15:42,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 240.0, 202.0, 252.0, 232.0, 245.0, 164.0, 219.0, 106.0, 239.0]
2025-05-11 19:15:42,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 19 minutes, 20 seconds)
2025-05-11 19:18:20,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:18:23,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 695.17743 ± 124.868
2025-05-11 19:18:23,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [808.94196, 713.42194, 690.858, 792.3568, 687.7068, 853.9711, 725.7844, 734.50104, 526.50653, 417.72604]
2025-05-11 19:18:23,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [255.0, 241.0, 247.0, 267.0, 229.0, 290.0, 244.0, 242.0, 208.0, 169.0]
2025-05-11 19:18:23,357 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 16 minutes, 49 seconds)
2025-05-11 19:20:58,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:21:01,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 651.72400 ± 177.731
2025-05-11 19:21:01,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [843.2735, 709.3838, 549.26697, 765.59265, 652.7966, 791.2826, 619.0342, 268.39905, 453.178, 865.0322]
2025-05-11 19:21:01,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 227.0, 201.0, 253.0, 224.0, 259.0, 203.0, 122.0, 179.0, 278.0]
2025-05-11 19:21:01,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 14 minutes)
2025-05-11 19:23:39,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:23:42,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 774.79364 ± 115.834
2025-05-11 19:23:42,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [666.86536, 802.1551, 513.9902, 810.5722, 988.79913, 742.546, 807.12616, 776.65643, 836.5667, 802.6589]
2025-05-11 19:23:42,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [227.0, 251.0, 184.0, 272.0, 297.0, 231.0, 258.0, 241.0, 263.0, 249.0]
2025-05-11 19:23:42,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1226 [INFO]: New best (774.79) for latency MM1Queue_a033_s075
2025-05-11 19:23:42,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1229 [INFO]: saving network
2025-05-11 19:23:42,832 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 19:23:42,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 11 minutes, 33 seconds)
2025-05-11 19:26:22,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:26:24,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 630.28772 ± 207.488
2025-05-11 19:26:24,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [816.291, 466.9049, 893.404, 405.90707, 632.92676, 743.7951, 813.69556, 508.83878, 793.13104, 227.98352]
2025-05-11 19:26:24,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [260.0, 167.0, 274.0, 165.0, 228.0, 252.0, 269.0, 192.0, 258.0, 109.0]
2025-05-11 19:26:24,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 9 minutes, 23 seconds)
2025-05-11 19:29:00,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:29:03,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 721.21149 ± 66.475
2025-05-11 19:29:03,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [609.23975, 756.9131, 732.4074, 704.56036, 712.1262, 597.5252, 742.20667, 769.6085, 819.09106, 768.4369]
2025-05-11 19:29:03,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 250.0, 237.0, 236.0, 231.0, 211.0, 245.0, 251.0, 263.0, 242.0]
2025-05-11 19:29:03,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 6 minutes, 48 seconds)
2025-05-11 19:31:42,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:31:45,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 727.32379 ± 198.445
2025-05-11 19:31:45,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [737.3339, 167.32892, 932.3966, 807.4727, 813.7127, 808.3504, 710.44806, 855.3506, 740.81274, 700.03076]
2025-05-11 19:31:45,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 87.0, 298.0, 254.0, 256.0, 252.0, 230.0, 266.0, 245.0, 232.0]
2025-05-11 19:31:45,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 4 minutes, 8 seconds)
2025-05-11 19:34:23,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:34:26,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 606.06940 ± 213.026
2025-05-11 19:34:26,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [779.0258, 491.12497, 782.7715, 724.05145, 147.53023, 692.26636, 609.79834, 770.45496, 297.54297, 766.1275]
2025-05-11 19:34:26,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 180.0, 248.0, 232.0, 83.0, 225.0, 228.0, 240.0, 123.0, 265.0]
2025-05-11 19:34:26,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 1 minute, 42 seconds)
2025-05-11 19:37:06,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:37:09,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 661.65912 ± 261.066
2025-05-11 19:37:09,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [141.59181, 274.56863, 806.6872, 814.1925, 685.1405, 445.94354, 818.6532, 865.94116, 938.68665, 825.18634]
2025-05-11 19:37:09,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 138.0, 259.0, 256.0, 228.0, 175.0, 256.0, 278.0, 293.0, 270.0]
2025-05-11 19:37:09,180 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 59 minutes, 7 seconds)
2025-05-11 19:39:45,276 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:39:47,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 630.11292 ± 120.026
2025-05-11 19:39:47,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [778.38007, 707.0261, 694.4493, 428.24847, 696.6595, 380.1442, 678.0907, 650.8419, 618.4973, 668.7908]
2025-05-11 19:39:47,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [244.0, 243.0, 236.0, 166.0, 232.0, 157.0, 231.0, 219.0, 216.0, 227.0]
2025-05-11 19:39:47,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 56 minutes, 13 seconds)
2025-05-11 19:42:27,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:42:29,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 718.82336 ± 77.606
2025-05-11 19:42:29,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [712.8942, 802.4067, 684.2774, 754.5756, 686.82117, 798.5216, 522.57104, 712.5626, 787.78296, 725.82043]
2025-05-11 19:42:29,976 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 248.0, 223.0, 243.0, 224.0, 253.0, 188.0, 235.0, 245.0, 227.0]
2025-05-11 19:42:29,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 53 minutes, 44 seconds)
2025-05-11 19:45:07,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:45:10,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 682.30865 ± 182.283
2025-05-11 19:45:10,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [531.4882, 778.6416, 814.268, 775.13525, 467.80356, 272.45404, 729.31946, 841.8557, 775.01086, 837.1097]
2025-05-11 19:45:10,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 243.0, 252.0, 246.0, 179.0, 126.0, 227.0, 263.0, 242.0, 267.0]
2025-05-11 19:45:10,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 51 minutes, 1 second)
2025-05-11 19:47:45,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:47:48,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 721.06323 ± 61.449
2025-05-11 19:47:48,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [587.634, 803.4628, 681.4493, 778.7031, 775.8053, 661.89575, 719.9246, 762.7102, 715.20734, 723.8397]
2025-05-11 19:47:48,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 254.0, 222.0, 252.0, 251.0, 224.0, 236.0, 250.0, 238.0, 238.0]
2025-05-11 19:47:48,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 48 minutes, 6 seconds)
2025-05-11 19:50:30,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:50:33,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 639.48431 ± 181.576
2025-05-11 19:50:33,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [806.27856, 789.1715, 405.2672, 697.4588, 797.30225, 504.53568, 815.6732, 572.4346, 273.64703, 733.07416]
2025-05-11 19:50:33,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [252.0, 244.0, 152.0, 232.0, 246.0, 191.0, 254.0, 199.0, 123.0, 243.0]
2025-05-11 19:50:33,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 45 minutes, 33 seconds)
2025-05-11 19:53:11,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:53:13,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 587.61597 ± 234.463
2025-05-11 19:53:13,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [740.3997, 499.18848, 422.54712, 686.16864, 698.45953, 837.1981, 107.77206, 833.4779, 758.5032, 292.44464]
2025-05-11 19:53:13,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [245.0, 188.0, 165.0, 230.0, 230.0, 258.0, 66.0, 254.0, 246.0, 120.0]
2025-05-11 19:53:13,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 42 minutes, 58 seconds)
2025-05-11 19:55:56,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:55:59,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 714.54047 ± 162.917
2025-05-11 19:55:59,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [355.87155, 847.02576, 714.95874, 816.03424, 491.6914, 829.9822, 831.4397, 616.5678, 816.6212, 825.2123]
2025-05-11 19:55:59,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 266.0, 233.0, 251.0, 184.0, 271.0, 255.0, 215.0, 248.0, 255.0]
2025-05-11 19:55:59,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 40 minutes, 29 seconds)
2025-05-11 19:58:42,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 19:58:45,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 644.19025 ± 223.411
2025-05-11 19:58:45,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [259.73465, 823.4951, 186.92235, 723.4179, 784.9466, 886.91125, 781.7849, 692.231, 608.03937, 694.41895]
2025-05-11 19:58:45,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 256.0, 91.0, 240.0, 247.0, 271.0, 242.0, 233.0, 221.0, 232.0]
2025-05-11 19:58:45,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 38 minutes, 1 second)
2025-05-11 20:01:29,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:01:32,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 694.25159 ± 179.008
2025-05-11 20:01:32,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [854.1921, 889.05835, 800.5293, 906.90186, 662.0819, 662.35284, 312.62766, 638.203, 476.41165, 740.1569]
2025-05-11 20:01:32,298 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [268.0, 272.0, 254.0, 286.0, 234.0, 234.0, 139.0, 227.0, 181.0, 250.0]
2025-05-11 20:01:32,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 35 minutes, 42 seconds)
2025-05-11 20:04:11,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:04:14,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 666.26105 ± 284.860
2025-05-11 20:04:14,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [571.5588, 273.65424, 833.48126, 913.4268, 788.6682, 872.99426, 192.13667, 327.94843, 882.6106, 1006.13074]
2025-05-11 20:04:14,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 121.0, 261.0, 289.0, 248.0, 285.0, 96.0, 140.0, 284.0, 334.0]
2025-05-11 20:04:14,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 32 minutes, 50 seconds)
2025-05-11 20:06:59,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:07:02,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 706.09760 ± 172.872
2025-05-11 20:07:02,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [570.2201, 784.61426, 862.83167, 400.5323, 869.6934, 775.8845, 751.30273, 842.5203, 398.9659, 804.41125]
2025-05-11 20:07:02,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [217.0, 247.0, 273.0, 156.0, 267.0, 247.0, 248.0, 272.0, 157.0, 258.0]
2025-05-11 20:07:02,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 30 minutes, 22 seconds)
2025-05-11 20:09:42,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:09:44,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 597.46008 ± 257.534
2025-05-11 20:09:44,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [294.3922, 130.7995, 539.92413, 851.20044, 843.81335, 715.75146, 810.41626, 815.32043, 269.93173, 703.05164]
2025-05-11 20:09:44,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 69.0, 180.0, 259.0, 256.0, 240.0, 264.0, 252.0, 117.0, 236.0]
2025-05-11 20:09:45,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 27 minutes, 30 seconds)
2025-05-11 20:12:26,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:12:29,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 752.77209 ± 81.821
2025-05-11 20:12:29,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [778.22626, 733.62384, 815.38336, 795.1214, 792.31903, 819.5626, 529.5646, 729.32056, 808.53394, 726.0659]
2025-05-11 20:12:29,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [252.0, 243.0, 252.0, 251.0, 248.0, 253.0, 180.0, 234.0, 257.0, 239.0]
2025-05-11 20:12:29,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 24 minutes, 43 seconds)
2025-05-11 20:15:13,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:15:16,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 725.09729 ± 253.566
2025-05-11 20:15:16,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [1027.5804, 849.9375, 270.82858, 691.63776, 753.12787, 879.6312, 322.43784, 758.8594, 1083.0063, 613.9262]
2025-05-11 20:15:16,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [341.0, 270.0, 117.0, 243.0, 253.0, 281.0, 136.0, 251.0, 340.0, 221.0]
2025-05-11 20:15:16,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 21 minutes, 59 seconds)
2025-05-11 20:17:59,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:18:02,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 706.74670 ± 123.689
2025-05-11 20:18:02,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [503.64655, 787.3047, 744.507, 771.7617, 821.905, 702.4108, 756.6679, 776.1227, 432.9149, 770.22577]
2025-05-11 20:18:02,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 241.0, 258.0, 243.0, 256.0, 226.0, 239.0, 244.0, 167.0, 264.0]
2025-05-11 20:18:02,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 19 minutes, 19 seconds)
2025-05-11 20:20:42,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:20:45,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 716.23590 ± 189.536
2025-05-11 20:20:45,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [726.7613, 839.5525, 226.03447, 517.2183, 843.59076, 802.5315, 872.6464, 785.5609, 820.8761, 727.5868]
2025-05-11 20:20:45,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 280.0, 107.0, 187.0, 271.0, 260.0, 275.0, 248.0, 261.0, 234.0]
2025-05-11 20:20:45,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 27 seconds)
2025-05-11 20:23:26,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:23:29,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 683.93390 ± 169.694
2025-05-11 20:23:29,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [246.72421, 711.8415, 671.9827, 928.5171, 677.9516, 699.8896, 670.6774, 689.54425, 670.18665, 872.0242]
2025-05-11 20:23:29,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 238.0, 228.0, 303.0, 235.0, 233.0, 236.0, 228.0, 225.0, 275.0]
2025-05-11 20:23:29,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 44 seconds)
2025-05-11 20:26:14,240 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:26:17,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 741.90845 ± 113.030
2025-05-11 20:26:17,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [824.48004, 701.7521, 489.96756, 802.3337, 926.4467, 642.81177, 796.6712, 703.3115, 728.8355, 802.4746]
2025-05-11 20:26:17,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 233.0, 186.0, 255.0, 293.0, 233.0, 244.0, 237.0, 247.0, 260.0]
2025-05-11 20:26:17,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 2 seconds)
2025-05-11 20:28:59,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:29:01,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 679.83496 ± 186.654
2025-05-11 20:29:01,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [470.84048, 633.13403, 724.70325, 819.9884, 438.66345, 656.41895, 418.17166, 1010.17664, 745.8872, 880.36597]
2025-05-11 20:29:01,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [178.0, 218.0, 223.0, 260.0, 157.0, 224.0, 162.0, 309.0, 234.0, 272.0]
2025-05-11 20:29:01,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 15 seconds)
2025-05-11 20:31:42,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:31:45,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 772.17029 ± 49.017
2025-05-11 20:31:45,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [802.2459, 745.1898, 848.3704, 777.463, 651.144, 795.72473, 748.8814, 785.78674, 769.74927, 797.1473]
2025-05-11 20:31:45,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [250.0, 240.0, 261.0, 246.0, 220.0, 258.0, 243.0, 251.0, 244.0, 249.0]
2025-05-11 20:31:45,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 29 seconds)
2025-05-11 20:34:30,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:34:33,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 733.88190 ± 167.563
2025-05-11 20:34:33,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [764.0874, 810.41046, 832.4806, 856.04736, 696.4344, 273.2701, 785.139, 856.787, 637.06836, 827.09406]
2025-05-11 20:34:33,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [249.0, 258.0, 310.0, 271.0, 243.0, 122.0, 257.0, 295.0, 230.0, 280.0]
2025-05-11 20:34:33,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 45 seconds)
2025-05-11 20:37:16,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 20:37:18,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1221 [DEBUG]: Total Reward: 704.56921 ± 172.461
2025-05-11 20:37:18,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1222 [DEBUG]: All rewards: [818.36304, 776.1225, 864.7561, 264.0385, 577.547, 801.9761, 780.431, 589.4764, 814.6355, 758.3453]
2025-05-11 20:37:18,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [253.0, 243.0, 262.0, 115.0, 195.0, 250.0, 250.0, 212.0, 259.0, 244.0]
2025-05-11 20:37:19,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1251 [DEBUG]: Training session finished
