2025-05-08 02:50:50,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac
2025-05-08 02:50:50,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac
2025-05-08 02:50:50,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7d36d81c3f10>}
2025-05-08 02:50:50,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1009 [DEBUG]: using device: cpu
2025-05-08 02:50:50,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1031 [INFO]: Creating new trainer
2025-05-08 02:50:50,963 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=11, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-08 02:50:50,963 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-08 02:50:51,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1092 [DEBUG]: Starting training session...
2025-05-08 02:50:51,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 1/100
2025-05-08 02:53:09,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:53:09,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 13.31743 ± 7.460
2025-05-08 02:53:09,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [18.530663, 11.866139, 5.700686, 13.067323, 5.1475177, 22.98817, 27.026417, 16.904394, 5.6487703, 6.2941823]
2025-05-08 02:53:09,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [28.0, 18.0, 15.0, 21.0, 21.0, 22.0, 33.0, 23.0, 15.0, 20.0]
2025-05-08 02:53:09,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (13.32) for latency ExtremeSparseL4U32
2025-05-08 02:53:09,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 02:53:09,771 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:53:09,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 48 minutes, 47 seconds)
2025-05-08 02:55:34,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:55:34,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 38.85422 ± 10.266
2025-05-08 02:55:34,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [41.659836, 34.380985, 42.922882, 35.539528, 17.2506, 42.515076, 42.12339, 60.91308, 35.40531, 35.83148]
2025-05-08 02:55:34,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [45.0, 29.0, 33.0, 29.0, 17.0, 33.0, 34.0, 45.0, 30.0, 30.0]
2025-05-08 02:55:34,735 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (38.85) for latency ExtremeSparseL4U32
2025-05-08 02:55:34,736 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 02:55:34,740 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 02:55:34,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 51 minutes, 37 seconds)
2025-05-08 02:58:00,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 02:58:00,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 30.82325 ± 17.421
2025-05-08 02:58:00,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [48.36094, 12.437445, 35.536785, 69.954544, 17.475851, 14.495554, 37.856087, 14.058258, 23.829367, 34.227642]
2025-05-08 02:58:00,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [38.0, 14.0, 27.0, 50.0, 17.0, 15.0, 29.0, 14.0, 24.0, 27.0]
2025-05-08 02:58:00,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 51 minutes, 34 seconds)
2025-05-08 03:00:26,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:00:27,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 46.23189 ± 19.220
2025-05-08 03:00:27,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [36.23032, 26.4039, 34.283035, 19.285362, 41.401875, 78.40176, 31.928558, 67.711235, 66.70605, 59.96677]
2025-05-08 03:00:27,359 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [34.0, 36.0, 41.0, 29.0, 43.0, 58.0, 31.0, 44.0, 48.0, 38.0]
2025-05-08 03:00:27,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (46.23) for latency ExtremeSparseL4U32
2025-05-08 03:00:27,360 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:00:27,364 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:00:27,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 50 minutes, 30 seconds)
2025-05-08 03:02:53,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:02:53,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 33.88911 ± 16.412
2025-05-08 03:02:53,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [21.467829, 18.015657, 23.231773, 48.690437, 44.559143, 20.86741, 13.363724, 54.214672, 62.73588, 31.744543]
2025-05-08 03:02:53,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [22.0, 19.0, 24.0, 34.0, 41.0, 22.0, 15.0, 40.0, 53.0, 52.0]
2025-05-08 03:02:53,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 48 minutes, 47 seconds)
2025-05-08 03:05:18,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:05:18,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 53.32755 ± 35.499
2025-05-08 03:05:18,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [93.81138, 31.691889, 41.40217, 33.099125, 13.349772, 66.24842, 60.314377, 14.246644, 44.963047, 134.14867]
2025-05-08 03:05:18,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [66.0, 29.0, 36.0, 30.0, 15.0, 43.0, 46.0, 15.0, 39.0, 96.0]
2025-05-08 03:05:18,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (53.33) for latency ExtremeSparseL4U32
2025-05-08 03:05:18,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:05:18,952 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:05:18,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 48 minutes, 28 seconds)
2025-05-08 03:07:46,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:07:46,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 58.87351 ± 31.679
2025-05-08 03:07:46,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [93.21739, 67.19505, 63.15986, 82.60617, 21.66425, 19.501831, 20.811731, 39.233616, 64.08321, 117.261986]
2025-05-08 03:07:46,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [54.0, 51.0, 46.0, 58.0, 20.0, 19.0, 20.0, 32.0, 39.0, 78.0]
2025-05-08 03:07:46,984 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (58.87) for latency ExtremeSparseL4U32
2025-05-08 03:07:46,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:07:46,989 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:07:46,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 46 minutes, 59 seconds)
2025-05-08 03:10:17,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:10:18,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 63.03922 ± 32.364
2025-05-08 03:10:18,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [67.84999, 24.053173, 115.27996, 98.267365, 19.484447, 94.284195, 21.829807, 59.287136, 51.465145, 78.590996]
2025-05-08 03:10:18,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [58.0, 27.0, 105.0, 57.0, 24.0, 66.0, 24.0, 41.0, 37.0, 47.0]
2025-05-08 03:10:18,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (63.04) for latency ExtremeSparseL4U32
2025-05-08 03:10:18,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:10:18,222 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:10:18,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 46 minutes, 7 seconds)
2025-05-08 03:12:50,848 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:12:51,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 49.06377 ± 24.275
2025-05-08 03:12:51,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [17.618937, 26.281712, 30.790747, 48.02779, 51.190598, 23.31162, 59.231785, 100.34364, 69.34123, 64.49966]
2025-05-08 03:12:51,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [18.0, 24.0, 31.0, 35.0, 38.0, 22.0, 50.0, 67.0, 46.0, 45.0]
2025-05-08 03:12:51,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 45 minutes, 38 seconds)
2025-05-08 03:15:23,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:15:23,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 47.43852 ± 38.343
2025-05-08 03:15:23,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [80.483406, 16.08923, 40.615475, 71.12062, 140.01416, 51.911602, 11.438589, 20.106155, 18.135536, 24.470453]
2025-05-08 03:15:23,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [50.0, 19.0, 37.0, 59.0, 92.0, 43.0, 13.0, 20.0, 18.0, 22.0]
2025-05-08 03:15:24,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 45 minutes, 7 seconds)
2025-05-08 03:17:56,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:17:57,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 51.62621 ± 31.667
2025-05-08 03:17:57,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [47.348602, 34.839863, 21.022919, 81.09269, 104.29382, 22.347294, 83.71814, 22.935728, 84.17897, 14.484126]
2025-05-08 03:17:57,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [37.0, 29.0, 20.0, 47.0, 74.0, 21.0, 54.0, 21.0, 51.0, 15.0]
2025-05-08 03:17:57,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 44 minutes, 53 seconds)
2025-05-08 03:20:29,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:20:30,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 64.19900 ± 35.592
2025-05-08 03:20:30,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [112.2778, 70.43725, 103.95715, 56.35825, 67.580894, 59.578094, 20.616045, 23.090033, 13.784563, 114.30996]
2025-05-08 03:20:30,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [69.0, 47.0, 59.0, 35.0, 46.0, 37.0, 19.0, 21.0, 15.0, 68.0]
2025-05-08 03:20:30,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (64.20) for latency ExtremeSparseL4U32
2025-05-08 03:20:30,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:20:30,293 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:20:30,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 43 minutes, 54 seconds)
2025-05-08 03:23:01,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:23:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 56.74264 ± 36.991
2025-05-08 03:23:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [19.270618, 91.41381, 17.239143, 106.588615, 80.73786, 20.019812, 20.102314, 84.94222, 101.4832, 25.62881]
2025-05-08 03:23:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [22.0, 54.0, 17.0, 70.0, 49.0, 19.0, 19.0, 53.0, 59.0, 24.0]
2025-05-08 03:23:02,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 41 minutes, 35 seconds)
2025-05-08 03:25:33,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:25:34,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 86.62531 ± 38.386
2025-05-08 03:25:34,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [17.655169, 107.99126, 103.33431, 17.207062, 86.12198, 113.33206, 75.32596, 116.60551, 138.28778, 90.39198]
2025-05-08 03:25:34,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [17.0, 60.0, 56.0, 17.0, 53.0, 76.0, 59.0, 73.0, 72.0, 83.0]
2025-05-08 03:25:34,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (86.63) for latency ExtremeSparseL4U32
2025-05-08 03:25:34,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:25:34,377 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:25:34,384 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 38 minutes, 46 seconds)
2025-05-08 03:28:06,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:28:06,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 69.68693 ± 49.218
2025-05-08 03:28:06,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [90.114586, 61.466827, 35.865856, 106.13127, 28.499147, 19.318045, 14.611929, 41.048233, 162.3459, 137.46747]
2025-05-08 03:28:06,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [55.0, 42.0, 33.0, 63.0, 26.0, 20.0, 16.0, 41.0, 92.0, 74.0]
2025-05-08 03:28:06,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 36 minutes, 10 seconds)
2025-05-08 03:30:38,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:30:39,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 74.02425 ± 54.505
2025-05-08 03:30:39,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [71.88373, 13.776182, 87.46746, 165.53506, 144.02948, 109.58183, 13.125881, 11.783876, 103.11477, 19.944197]
2025-05-08 03:30:39,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [45.0, 15.0, 52.0, 91.0, 85.0, 60.0, 14.0, 13.0, 70.0, 19.0]
2025-05-08 03:30:39,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 33 minutes, 25 seconds)
2025-05-08 03:33:10,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:33:10,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 44.81515 ± 37.190
2025-05-08 03:33:10,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [37.617554, 81.711845, 14.062669, 107.462166, 15.509541, 17.161947, 17.464909, 11.517251, 108.16437, 37.47928]
2025-05-08 03:33:10,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [38.0, 64.0, 15.0, 65.0, 16.0, 17.0, 17.0, 13.0, 71.0, 34.0]
2025-05-08 03:33:10,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 30 minutes, 23 seconds)
2025-05-08 03:35:42,520 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:35:42,969 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 79.04402 ± 63.564
2025-05-08 03:35:42,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [65.065475, 157.92752, 21.600958, 17.053709, 197.17776, 19.569677, 14.470756, 105.93274, 139.95087, 51.69074]
2025-05-08 03:35:42,970 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [45.0, 74.0, 20.0, 18.0, 107.0, 19.0, 15.0, 62.0, 76.0, 37.0]
2025-05-08 03:35:42,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 27 minutes, 54 seconds)
2025-05-08 03:38:14,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:38:14,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 58.10142 ± 45.480
2025-05-08 03:38:14,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [22.531137, 123.87778, 70.118256, 20.215736, 16.95606, 17.242374, 82.82913, 143.73807, 12.852323, 70.65327]
2025-05-08 03:38:14,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [21.0, 80.0, 48.0, 20.0, 17.0, 18.0, 57.0, 89.0, 14.0, 54.0]
2025-05-08 03:38:14,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 25 minutes, 14 seconds)
2025-05-08 03:40:45,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:40:45,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 68.40079 ± 56.497
2025-05-08 03:40:45,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [21.839338, 27.09341, 60.72793, 108.50067, 40.300186, 130.47092, 199.34007, 22.818335, 21.102783, 51.81433]
2025-05-08 03:40:45,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [20.0, 35.0, 76.0, 87.0, 35.0, 82.0, 103.0, 21.0, 19.0, 38.0]
2025-05-08 03:40:45,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 22 minutes, 19 seconds)
2025-05-08 03:43:16,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:43:17,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 77.91390 ± 52.279
2025-05-08 03:43:17,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [84.69515, 168.40295, 103.40788, 20.327667, 117.655, 139.9281, 10.508034, 16.473354, 77.05971, 40.681168]
2025-05-08 03:43:17,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [65.0, 91.0, 62.0, 19.0, 66.0, 92.0, 13.0, 17.0, 44.0, 42.0]
2025-05-08 03:43:17,352 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 19 minutes, 37 seconds)
2025-05-08 03:45:49,020 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:45:49,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 108.68422 ± 31.156
2025-05-08 03:45:49,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [106.03693, 76.15224, 101.45647, 112.40258, 125.454384, 73.1693, 189.67159, 109.0912, 106.37277, 87.03466]
2025-05-08 03:45:49,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [59.0, 49.0, 64.0, 60.0, 70.0, 49.0, 111.0, 60.0, 58.0, 54.0]
2025-05-08 03:45:49,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (108.68) for latency ExtremeSparseL4U32
2025-05-08 03:45:49,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:45:49,649 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:45:49,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 17 minutes, 19 seconds)
2025-05-08 03:48:21,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:48:21,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 73.78578 ± 45.741
2025-05-08 03:48:21,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [98.56792, 146.77618, 111.589905, 115.98923, 104.89968, 54.260094, 16.505281, 9.967583, 20.458351, 58.843594]
2025-05-08 03:48:21,903 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [64.0, 78.0, 67.0, 73.0, 65.0, 38.0, 18.0, 14.0, 20.0, 52.0]
2025-05-08 03:48:21,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 14 minutes, 47 seconds)
2025-05-08 03:50:53,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:50:54,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 76.10085 ± 29.344
2025-05-08 03:50:54,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [88.015656, 56.66074, 80.86196, 92.04142, 39.975548, 81.403625, 149.55563, 53.91634, 55.56714, 63.010418]
2025-05-08 03:50:54,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [53.0, 56.0, 47.0, 56.0, 32.0, 48.0, 90.0, 38.0, 42.0, 41.0]
2025-05-08 03:50:54,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 12 minutes, 28 seconds)
2025-05-08 03:53:27,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:53:28,654 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 109.79718 ± 90.710
2025-05-08 03:53:28,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [210.64249, 50.28405, 256.3309, 67.239845, 33.63273, 65.736084, 265.66333, 17.928194, 85.80041, 44.71373]
2025-05-08 03:53:28,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [103.0, 37.0, 130.0, 40.0, 33.0, 57.0, 133.0, 18.0, 60.0, 35.0]
2025-05-08 03:53:28,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (109.80) for latency ExtremeSparseL4U32
2025-05-08 03:53:28,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 03:53:28,659 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 03:53:28,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 10 minutes, 44 seconds)
2025-05-08 03:56:02,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:56:02,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 83.03551 ± 64.436
2025-05-08 03:56:02,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [15.640421, 53.113205, 21.910734, 190.11691, 17.896023, 156.65508, 166.00615, 22.303875, 78.6751, 108.03747]
2025-05-08 03:56:02,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [16.0, 58.0, 20.0, 122.0, 18.0, 76.0, 104.0, 22.0, 59.0, 99.0]
2025-05-08 03:56:02,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 8 minutes, 47 seconds)
2025-05-08 03:58:34,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 03:58:34,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 58.40331 ± 44.784
2025-05-08 03:58:34,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [165.38742, 22.948032, 53.431057, 43.799503, 19.514277, 103.30337, 39.72715, 85.199104, 12.39122, 38.331917]
2025-05-08 03:58:34,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [94.0, 28.0, 48.0, 51.0, 18.0, 93.0, 29.0, 60.0, 15.0, 40.0]
2025-05-08 03:58:34,728 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 6 minutes, 10 seconds)
2025-05-08 04:01:08,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:01:08,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 73.95794 ± 68.504
2025-05-08 04:01:08,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [177.07504, 15.976648, 17.894857, 16.854595, 141.94128, 14.244351, 196.54388, 61.786816, 17.942698, 79.31922]
2025-05-08 04:01:08,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [92.0, 16.0, 17.0, 17.0, 72.0, 15.0, 88.0, 57.0, 18.0, 57.0]
2025-05-08 04:01:08,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 4 minutes, 2 seconds)
2025-05-08 04:03:40,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:03:41,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 88.12907 ± 63.155
2025-05-08 04:03:41,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [125.759155, 42.329445, 238.22104, 87.80835, 72.35366, 20.747751, 46.93346, 48.404095, 46.793453, 151.94035]
2025-05-08 04:03:41,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [113.0, 40.0, 123.0, 72.0, 51.0, 20.0, 35.0, 29.0, 49.0, 81.0]
2025-05-08 04:03:41,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 1 minute, 28 seconds)
2025-05-08 04:06:12,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:06:13,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 56.36806 ± 60.350
2025-05-08 04:06:13,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [18.322523, 86.23247, 17.245216, 76.54768, 15.485286, 93.93847, 15.480459, 13.422401, 15.994197, 211.01192]
2025-05-08 04:06:13,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [18.0, 64.0, 17.0, 52.0, 16.0, 61.0, 16.0, 15.0, 17.0, 102.0]
2025-05-08 04:06:13,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 58 minutes, 21 seconds)
2025-05-08 04:08:42,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:08:42,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 14.27946 ± 3.293
2025-05-08 04:08:42,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [12.388505, 13.094053, 23.569174, 12.9375105, 13.7871065, 12.427645, 12.109627, 12.025342, 15.018286, 15.437358]
2025-05-08 04:08:42,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [14.0, 14.0, 21.0, 14.0, 15.0, 14.0, 14.0, 14.0, 16.0, 16.0]
2025-05-08 04:08:42,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 54 minutes, 50 seconds)
2025-05-08 04:11:13,402 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:11:13,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 16.59780 ± 7.324
2025-05-08 04:11:13,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [8.716591, 11.506633, 30.681593, 15.101637, 14.616792, 27.50969, 22.471535, 15.289599, 11.172702, 8.911175]
2025-05-08 04:11:13,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [13.0, 16.0, 34.0, 17.0, 19.0, 41.0, 26.0, 24.0, 16.0, 14.0]
2025-05-08 04:11:13,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 52 minutes, 1 second)
2025-05-08 04:13:45,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:13:45,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 22.68550 ± 21.582
2025-05-08 04:13:45,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [11.194737, 23.277353, 11.005179, 16.175594, 12.681968, 86.05848, 12.881519, 18.345226, 23.277472, 11.957459]
2025-05-08 04:13:45,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [13.0, 23.0, 13.0, 18.0, 16.0, 63.0, 16.0, 18.0, 22.0, 15.0]
2025-05-08 04:13:45,919 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 49 minutes, 6 seconds)
2025-05-08 04:16:15,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:16:16,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 114.64397 ± 48.412
2025-05-08 04:16:16,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [36.535816, 182.82666, 120.61066, 110.92859, 157.04053, 76.88465, 40.891216, 164.53865, 151.80484, 104.378044]
2025-05-08 04:16:16,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [34.0, 87.0, 69.0, 64.0, 75.0, 53.0, 33.0, 90.0, 86.0, 65.0]
2025-05-08 04:16:16,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (114.64) for latency ExtremeSparseL4U32
2025-05-08 04:16:16,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:16:16,251 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:16:16,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 46 minutes, 8 seconds)
2025-05-08 04:18:46,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:18:47,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 87.95660 ± 65.042
2025-05-08 04:18:47,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [75.0958, 75.39695, 110.66635, 63.445324, 176.11086, 32.653324, 101.83622, 10.678166, 222.20319, 11.479814]
2025-05-08 04:18:47,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [57.0, 56.0, 58.0, 57.0, 93.0, 36.0, 75.0, 13.0, 114.0, 13.0]
2025-05-08 04:18:47,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 43 minutes, 27 seconds)
2025-05-08 04:21:20,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:21:21,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 110.17696 ± 83.430
2025-05-08 04:21:21,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [28.220768, 223.14934, 54.283817, 228.03932, 105.66905, 198.10475, 51.647915, 10.547466, 24.108492, 177.99863]
2025-05-08 04:21:21,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [28.0, 108.0, 39.0, 117.0, 66.0, 107.0, 38.0, 13.0, 22.0, 92.0]
2025-05-08 04:21:21,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 41 minutes, 45 seconds)
2025-05-08 04:23:50,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:23:51,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 97.26078 ± 76.921
2025-05-08 04:23:51,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [21.202831, 45.617176, 140.69856, 59.189144, 75.26218, 70.73571, 257.59845, 79.70538, 210.75327, 11.845069]
2025-05-08 04:23:51,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [23.0, 44.0, 103.0, 47.0, 73.0, 54.0, 159.0, 52.0, 107.0, 15.0]
2025-05-08 04:23:51,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 39 minutes, 4 seconds)
2025-05-08 04:26:22,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:26:23,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 142.60040 ± 53.453
2025-05-08 04:26:23,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [168.95891, 171.19308, 28.71781, 77.97719, 134.81354, 103.57383, 194.98267, 157.37335, 198.785, 189.62862]
2025-05-08 04:26:23,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [89.0, 107.0, 24.0, 71.0, 88.0, 63.0, 114.0, 87.0, 105.0, 105.0]
2025-05-08 04:26:23,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (142.60) for latency ExtremeSparseL4U32
2025-05-08 04:26:23,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:26:23,596 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:26:23,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 36 minutes, 35 seconds)
2025-05-08 04:28:53,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:28:54,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 149.68576 ± 120.669
2025-05-08 04:28:54,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [20.559662, 12.954833, 244.16869, 175.1775, 226.66252, 126.77027, 430.276, 107.9364, 114.47446, 37.877254]
2025-05-08 04:28:54,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [19.0, 15.0, 114.0, 106.0, 144.0, 80.0, 221.0, 83.0, 81.0, 33.0]
2025-05-08 04:28:54,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (149.69) for latency ExtremeSparseL4U32
2025-05-08 04:28:54,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:28:54,815 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:28:54,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 34 minutes, 14 seconds)
2025-05-08 04:31:27,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:31:28,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 206.95987 ± 111.495
2025-05-08 04:31:28,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [230.5039, 95.41101, 81.44908, 276.46027, 326.0093, 83.552475, 46.34272, 359.36588, 280.54587, 289.9581]
2025-05-08 04:31:28,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [114.0, 64.0, 56.0, 130.0, 152.0, 56.0, 55.0, 157.0, 154.0, 143.0]
2025-05-08 04:31:28,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (206.96) for latency ExtremeSparseL4U32
2025-05-08 04:31:28,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:31:28,946 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:31:28,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 32 minutes, 16 seconds)
2025-05-08 04:33:58,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:33:59,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 174.40074 ± 108.490
2025-05-08 04:33:59,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [332.16138, 275.35562, 88.10866, 321.6071, 82.445404, 108.42578, 240.88445, 21.796654, 65.433624, 207.78877]
2025-05-08 04:33:59,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [150.0, 144.0, 59.0, 158.0, 58.0, 80.0, 118.0, 20.0, 61.0, 113.0]
2025-05-08 04:33:59,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 29 minutes, 12 seconds)
2025-05-08 04:36:32,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:36:33,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 152.14378 ± 108.154
2025-05-08 04:36:33,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [330.4155, 153.93234, 211.2268, 20.428495, 24.72021, 21.4089, 301.57162, 101.181114, 129.65181, 226.90097]
2025-05-08 04:36:33,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [148.0, 78.0, 101.0, 20.0, 22.0, 20.0, 154.0, 63.0, 108.0, 112.0]
2025-05-08 04:36:33,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 27 minutes, 21 seconds)
2025-05-08 04:39:02,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:39:03,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 116.21771 ± 113.554
2025-05-08 04:39:03,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [13.321039, 91.43513, 235.51183, 89.47351, 13.985568, 314.19946, 68.81393, 25.336464, 295.8076, 14.292626]
2025-05-08 04:39:03,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [14.0, 62.0, 103.0, 66.0, 15.0, 136.0, 42.0, 23.0, 145.0, 15.0]
2025-05-08 04:39:03,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 24 minutes, 21 seconds)
2025-05-08 04:41:35,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:41:35,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 80.42591 ± 104.096
2025-05-08 04:41:35,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [22.375267, 20.075808, 316.69556, 239.8428, 20.051605, 21.533167, 12.8506775, 20.191223, 111.93585, 18.707193]
2025-05-08 04:41:35,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [21.0, 19.0, 160.0, 106.0, 19.0, 20.0, 14.0, 19.0, 72.0, 18.0]
2025-05-08 04:41:35,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 21 minutes, 59 seconds)
2025-05-08 04:44:07,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:44:08,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 171.29735 ± 117.251
2025-05-08 04:44:08,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [14.790678, 226.78094, 14.845773, 73.83939, 343.93887, 283.80533, 209.35197, 49.860435, 195.24191, 300.51816]
2025-05-08 04:44:08,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [15.0, 109.0, 15.0, 59.0, 147.0, 130.0, 94.0, 46.0, 108.0, 119.0]
2025-05-08 04:44:08,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 19 minutes, 16 seconds)
2025-05-08 04:46:39,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:46:40,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 164.13426 ± 107.463
2025-05-08 04:46:40,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [107.74804, 71.989845, 118.85652, 322.14832, 113.99443, 253.88118, 231.55228, 339.5269, 56.75479, 24.890429]
2025-05-08 04:46:40,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [65.0, 43.0, 69.0, 176.0, 68.0, 119.0, 114.0, 160.0, 42.0, 20.0]
2025-05-08 04:46:40,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 16 minutes, 57 seconds)
2025-05-08 04:49:11,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:49:12,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 73.37535 ± 117.774
2025-05-08 04:49:12,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [14.9748745, 12.149442, 11.057919, 10.962612, 10.009926, 11.821271, 10.6876955, 46.06059, 249.06642, 356.96277]
2025-05-08 04:49:12,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [18.0, 16.0, 15.0, 15.0, 16.0, 16.0, 15.0, 46.0, 131.0, 196.0]
2025-05-08 04:49:12,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 14 minutes, 2 seconds)
2025-05-08 04:51:42,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:51:43,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 125.65839 ± 101.600
2025-05-08 04:51:43,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [16.891773, 114.59119, 21.709965, 232.42859, 61.605354, 59.265118, 289.4909, 163.4157, 23.615904, 273.56934]
2025-05-08 04:51:43,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [16.0, 81.0, 20.0, 114.0, 43.0, 41.0, 154.0, 97.0, 27.0, 124.0]
2025-05-08 04:51:43,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 11 minutes, 45 seconds)
2025-05-08 04:54:13,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:54:13,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 27.27687 ± 31.085
2025-05-08 04:54:13,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [19.133713, 112.62089, 12.323138, 54.231426, 16.227165, 9.590145, 12.682469, 14.394772, 9.240891, 12.324107]
2025-05-08 04:54:13,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [19.0, 70.0, 16.0, 37.0, 17.0, 14.0, 14.0, 15.0, 13.0, 15.0]
2025-05-08 04:54:13,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 8 minutes, 51 seconds)
2025-05-08 04:56:45,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:56:46,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 127.48519 ± 80.459
2025-05-08 04:56:46,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [11.72603, 253.49951, 166.5276, 179.74171, 18.004076, 149.38826, 234.03638, 46.769993, 113.68121, 101.47716]
2025-05-08 04:56:46,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [13.0, 116.0, 85.0, 97.0, 19.0, 79.0, 116.0, 36.0, 71.0, 80.0]
2025-05-08 04:56:46,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 6 minutes, 16 seconds)
2025-05-08 04:59:16,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:59:17,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 139.99490 ± 111.499
2025-05-08 04:59:17,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [77.199715, 258.18033, 267.88528, 25.95189, 43.952896, 49.60142, 42.499744, 262.5531, 61.801674, 310.32288]
2025-05-08 04:59:17,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [48.0, 143.0, 115.0, 31.0, 47.0, 37.0, 34.0, 118.0, 59.0, 144.0]
2025-05-08 04:59:17,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 3 minutes, 39 seconds)
2025-05-08 05:01:50,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:01:50,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 72.32640 ± 62.219
2025-05-08 05:01:50,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [20.095726, 35.173023, 17.276306, 17.639643, 16.072403, 196.70363, 43.547523, 111.38793, 145.09306, 120.27479]
2025-05-08 05:01:50,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [19.0, 30.0, 17.0, 17.0, 16.0, 106.0, 41.0, 62.0, 74.0, 70.0]
2025-05-08 05:01:50,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 1 minute, 23 seconds)
2025-05-08 05:04:21,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:04:22,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 131.87724 ± 136.787
2025-05-08 05:04:22,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [14.7362585, 359.75427, 8.41542, 311.9324, 17.991638, 189.77992, 77.59331, 301.58936, 20.623566, 16.35632]
2025-05-08 05:04:22,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [15.0, 142.0, 13.0, 125.0, 18.0, 97.0, 61.0, 130.0, 20.0, 17.0]
2025-05-08 05:04:22,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 58 minutes, 50 seconds)
2025-05-08 05:06:55,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:06:55,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 51.16346 ± 60.666
2025-05-08 05:06:55,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [13.385051, 215.69424, 20.0136, 69.41637, 9.306999, 20.57717, 89.59734, 46.095814, 16.594023, 10.954]
2025-05-08 05:06:55,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [14.0, 109.0, 24.0, 74.0, 12.0, 19.0, 67.0, 36.0, 17.0, 12.0]
2025-05-08 05:06:55,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 56 minutes, 53 seconds)
2025-05-08 05:09:26,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:09:26,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 135.42456 ± 144.253
2025-05-08 05:09:26,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [28.327898, 333.40576, 20.34337, 330.10242, 15.138822, 325.95377, 18.673763, 16.958473, 250.63014, 14.7111635]
2025-05-08 05:09:26,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [24.0, 146.0, 18.0, 135.0, 15.0, 130.0, 18.0, 17.0, 119.0, 15.0]
2025-05-08 05:09:26,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 54 minutes, 3 seconds)
2025-05-08 05:11:59,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:12:00,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 184.86406 ± 132.902
2025-05-08 05:12:00,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [182.19154, 350.99258, 17.068516, 288.78546, 292.31635, 42.864227, 63.39508, 231.31084, 365.06604, 14.650014]
2025-05-08 05:12:00,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [99.0, 153.0, 17.0, 147.0, 128.0, 35.0, 39.0, 122.0, 184.0, 15.0]
2025-05-08 05:12:00,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 51 minutes, 51 seconds)
2025-05-08 05:14:32,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:14:33,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 216.35031 ± 97.427
2025-05-08 05:14:33,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [28.454376, 183.68683, 146.64616, 191.26376, 367.59106, 293.0462, 310.56485, 118.4517, 232.79094, 291.00726]
2025-05-08 05:14:33,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [47.0, 94.0, 75.0, 88.0, 146.0, 130.0, 144.0, 76.0, 123.0, 124.0]
2025-05-08 05:14:33,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (216.35) for latency ExtremeSparseL4U32
2025-05-08 05:14:33,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:14:33,453 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:14:33,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 49 minutes, 18 seconds)
2025-05-08 05:17:07,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:17:07,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 104.84364 ± 109.058
2025-05-08 05:17:07,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [8.257182, 38.215424, 31.1757, 22.533533, 9.539053, 238.20564, 43.065422, 279.65707, 95.103615, 282.68378]
2025-05-08 05:17:07,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [14.0, 36.0, 37.0, 20.0, 15.0, 105.0, 35.0, 111.0, 59.0, 137.0]
2025-05-08 05:17:07,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 47 minutes, 11 seconds)
2025-05-08 05:19:39,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:19:40,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 135.72260 ± 122.466
2025-05-08 05:19:40,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [267.858, 73.97571, 249.37692, 266.26352, 38.13599, 344.22638, 26.59431, 14.900353, 30.05624, 45.838512]
2025-05-08 05:19:40,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [142.0, 64.0, 131.0, 133.0, 29.0, 142.0, 27.0, 15.0, 24.0, 35.0]
2025-05-08 05:19:40,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 44 minutes, 29 seconds)
2025-05-08 05:22:10,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:22:11,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 54.23806 ± 70.027
2025-05-08 05:22:11,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [8.828997, 14.654243, 233.74594, 15.949399, 8.620844, 27.387035, 19.336504, 7.919053, 82.912254, 123.02633]
2025-05-08 05:22:11,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [12.0, 19.0, 131.0, 17.0, 16.0, 45.0, 18.0, 13.0, 48.0, 86.0]
2025-05-08 05:22:11,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 41 minutes, 56 seconds)
2025-05-08 05:24:45,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:24:46,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 186.05283 ± 148.993
2025-05-08 05:24:46,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [19.220959, 404.13556, 294.84323, 271.6209, 260.06317, 14.3308115, 14.544464, 15.802082, 195.57513, 370.39212]
2025-05-08 05:24:46,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [18.0, 160.0, 138.0, 145.0, 118.0, 15.0, 15.0, 16.0, 106.0, 192.0]
2025-05-08 05:24:46,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 39 minutes, 34 seconds)
2025-05-08 05:27:18,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:27:18,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 15.68839 ± 2.678
2025-05-08 05:27:18,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [12.05401, 19.90548, 15.837282, 11.447762, 14.706102, 17.63266, 14.602444, 16.936302, 14.390918, 19.370964]
2025-05-08 05:27:18,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [13.0, 19.0, 16.0, 13.0, 15.0, 17.0, 15.0, 17.0, 15.0, 18.0]
2025-05-08 05:27:18,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 36 minutes, 52 seconds)
2025-05-08 05:29:49,726 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:29:50,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 132.28360 ± 130.286
2025-05-08 05:29:50,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [281.44696, 78.239426, 21.839443, 349.3833, 337.82693, 35.090866, 17.76927, 12.246087, 134.3138, 54.679783]
2025-05-08 05:29:50,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [112.0, 71.0, 20.0, 137.0, 149.0, 31.0, 17.0, 14.0, 73.0, 52.0]
2025-05-08 05:29:50,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 34 minutes, 4 seconds)
2025-05-08 05:32:23,551 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:32:23,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 51.97981 ± 84.479
2025-05-08 05:32:23,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [14.887305, 300.9541, 22.645737, 18.027302, 13.996843, 17.572937, 14.434227, 16.907465, 69.11973, 31.25248]
2025-05-08 05:32:23,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [15.0, 149.0, 20.0, 17.0, 14.0, 17.0, 15.0, 17.0, 46.0, 35.0]
2025-05-08 05:32:23,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 31 minutes, 37 seconds)
2025-05-08 05:34:55,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:34:56,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 81.57481 ± 135.942
2025-05-08 05:34:56,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [13.4713125, 357.30148, 14.94833, 13.248311, 15.83152, 349.53555, 10.621529, 15.862057, 12.843909, 12.084034]
2025-05-08 05:34:56,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [14.0, 172.0, 15.0, 14.0, 16.0, 160.0, 12.0, 16.0, 14.0, 13.0]
2025-05-08 05:34:56,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 29 minutes, 14 seconds)
2025-05-08 05:37:27,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:37:28,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 172.72438 ± 127.304
2025-05-08 05:37:28,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [25.926922, 249.06126, 13.691573, 255.38832, 256.11057, 20.179667, 313.3824, 18.573248, 322.54492, 252.38487]
2025-05-08 05:37:28,311 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [28.0, 120.0, 15.0, 133.0, 110.0, 19.0, 129.0, 18.0, 162.0, 101.0]
2025-05-08 05:37:28,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 26 minutes, 20 seconds)
2025-05-08 05:40:00,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:40:01,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 159.13644 ± 142.566
2025-05-08 05:40:01,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [89.98137, 374.77112, 186.6586, 24.305319, 48.162033, 40.910027, 20.396051, 420.94144, 101.22416, 284.01447]
2025-05-08 05:40:01,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [74.0, 183.0, 90.0, 22.0, 35.0, 31.0, 19.0, 223.0, 62.0, 168.0]
2025-05-08 05:40:01,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 23 minutes, 58 seconds)
2025-05-08 05:42:33,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:42:33,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 32.70038 ± 26.390
2025-05-08 05:42:33,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [12.072168, 31.175661, 78.19788, 83.32026, 13.334129, 21.633718, 10.350111, 18.14093, 11.042457, 47.736492]
2025-05-08 05:42:33,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [13.0, 41.0, 54.0, 92.0, 15.0, 20.0, 13.0, 17.0, 13.0, 38.0]
2025-05-08 05:42:33,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 21 minutes, 25 seconds)
2025-05-08 05:45:05,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:45:05,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 91.07915 ± 112.358
2025-05-08 05:45:05,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [153.02834, 277.43118, 15.472951, 321.82904, 21.372694, 14.933147, 24.27564, 6.7414494, 58.083298, 17.62369]
2025-05-08 05:45:05,918 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 111.0, 16.0, 132.0, 20.0, 15.0, 22.0, 35.0, 40.0, 17.0]
2025-05-08 05:45:05,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 18 minutes, 44 seconds)
2025-05-08 05:47:39,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:47:39,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 98.79031 ± 112.157
2025-05-08 05:47:39,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [89.499435, 24.22837, 177.39365, 22.173782, 3.4984663, 38.105022, 288.68362, 312.18024, 10.877092, 21.263367]
2025-05-08 05:47:39,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [53.0, 30.0, 104.0, 20.0, 26.0, 49.0, 138.0, 140.0, 15.0, 20.0]
2025-05-08 05:47:39,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 16 minutes, 21 seconds)
2025-05-08 05:50:10,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:50:10,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 11.39142 ± 3.767
2025-05-08 05:50:10,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [14.825377, 9.338127, 8.53231, 10.383714, 7.7840185, 7.262077, 20.132421, 9.872892, 14.204698, 11.578534]
2025-05-08 05:50:10,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [16.0, 15.0, 14.0, 15.0, 14.0, 13.0, 19.0, 15.0, 17.0, 16.0]
2025-05-08 05:50:10,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 13 minutes, 40 seconds)
2025-05-08 05:52:43,585 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:52:44,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 222.38956 ± 153.566
2025-05-08 05:52:44,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [247.25536, 72.392944, 15.027953, 430.44592, 14.824082, 358.6697, 404.48816, 251.89293, 90.73829, 338.16016]
2025-05-08 05:52:44,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [105.0, 45.0, 15.0, 188.0, 15.0, 161.0, 181.0, 117.0, 84.0, 137.0]
2025-05-08 05:52:44,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (222.39) for latency ExtremeSparseL4U32
2025-05-08 05:52:44,684 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:52:44,688 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:52:44,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 11 minutes, 12 seconds)
2025-05-08 05:55:15,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:55:16,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 21.53196 ± 25.341
2025-05-08 05:55:16,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [96.55531, 11.293162, 10.23756, 9.840338, 13.19064, 14.126503, 12.59266, 11.862899, 24.824162, 10.796386]
2025-05-08 05:55:16,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [72.0, 14.0, 13.0, 14.0, 16.0, 17.0, 15.0, 14.0, 33.0, 15.0]
2025-05-08 05:55:16,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 8 minutes, 36 seconds)
2025-05-08 05:57:48,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:57:48,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 152.95694 ± 139.377
2025-05-08 05:57:48,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [246.65828, 426.98672, 124.06863, 17.779434, 271.4583, 22.83052, 35.208267, 297.23853, 14.226233, 73.11445]
2025-05-08 05:57:48,866 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [119.0, 190.0, 65.0, 19.0, 135.0, 22.0, 43.0, 132.0, 15.0, 45.0]
2025-05-08 05:57:48,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 6 minutes, 7 seconds)
2025-05-08 06:00:21,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:00:21,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 74.19864 ± 133.278
2025-05-08 06:00:21,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [444.92856, 9.48512, 6.743392, 10.325397, 7.3699045, 54.885433, 8.551422, 176.22707, 13.831651, 9.638411]
2025-05-08 06:00:21,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [201.0, 14.0, 12.0, 15.0, 13.0, 34.0, 13.0, 90.0, 16.0, 14.0]
2025-05-08 06:00:21,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 3 minutes, 29 seconds)
2025-05-08 06:02:54,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:02:54,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 146.42175 ± 121.342
2025-05-08 06:02:54,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [259.354, 39.224377, 14.741972, 332.61234, 35.665443, 320.37524, 214.21217, 50.74833, 27.420603, 169.86307]
2025-05-08 06:02:54,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [105.0, 32.0, 15.0, 154.0, 29.0, 133.0, 109.0, 46.0, 35.0, 98.0]
2025-05-08 06:02:54,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 1 minute, 9 seconds)
2025-05-08 06:05:27,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:05:28,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 154.98871 ± 106.511
2025-05-08 06:05:28,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [82.443146, 272.81863, 265.69196, 355.35684, 206.75804, 102.65176, 100.37745, 15.59573, 51.97874, 96.214714]
2025-05-08 06:05:28,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [69.0, 131.0, 129.0, 145.0, 130.0, 68.0, 61.0, 16.0, 34.0, 70.0]
2025-05-08 06:05:28,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 78/100 (estimated time remaining: 58 minutes, 31 seconds)
2025-05-08 06:08:01,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:08:02,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 147.19730 ± 152.063
2025-05-08 06:08:02,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [99.02722, 50.259296, 15.8163805, 11.265791, 9.688494, 22.221918, 405.99893, 159.48941, 331.9531, 366.25247]
2025-05-08 06:08:02,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [63.0, 40.0, 19.0, 15.0, 14.0, 20.0, 195.0, 100.0, 185.0, 189.0]
2025-05-08 06:08:02,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 79/100 (estimated time remaining: 56 minutes, 10 seconds)
2025-05-08 06:10:34,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:10:35,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 146.60118 ± 148.157
2025-05-08 06:10:35,397 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [163.32533, 370.32758, 19.250668, 31.845743, 391.4256, 19.057825, 17.871006, 18.655779, 115.42032, 318.83206]
2025-05-08 06:10:35,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [84.0, 190.0, 18.0, 45.0, 206.0, 18.0, 17.0, 18.0, 69.0, 143.0]
2025-05-08 06:10:35,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 80/100 (estimated time remaining: 53 minutes, 39 seconds)
2025-05-08 06:13:08,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:13:09,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 148.60095 ± 94.122
2025-05-08 06:13:09,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [19.76705, 256.2346, 34.547718, 100.37316, 300.33936, 161.5365, 269.0133, 143.08406, 63.585014, 137.52882]
2025-05-08 06:13:09,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [19.0, 121.0, 39.0, 70.0, 130.0, 80.0, 150.0, 84.0, 61.0, 92.0]
2025-05-08 06:13:09,153 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 81/100 (estimated time remaining: 51 minutes, 10 seconds)
2025-05-08 06:15:51,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:15:52,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 235.89957 ± 115.685
2025-05-08 06:15:52,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [257.75125, 115.65453, 228.67046, 334.86234, 330.82108, 14.807333, 331.43762, 95.54107, 267.50824, 381.9416]
2025-05-08 06:15:52,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [111.0, 72.0, 110.0, 134.0, 154.0, 15.0, 142.0, 62.0, 123.0, 201.0]
2025-05-08 06:15:52,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (235.90) for latency ExtremeSparseL4U32
2025-05-08 06:15:52,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 06:15:52,355 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 06:15:52,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 82/100 (estimated time remaining: 49 minutes, 14 seconds)
2025-05-08 06:18:24,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:18:25,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 224.32822 ± 105.013
2025-05-08 06:18:25,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [316.68033, 388.029, 345.9127, 173.18271, 209.68282, 201.33423, 277.46146, 127.64322, 186.97743, 16.378305]
2025-05-08 06:18:25,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [129.0, 204.0, 149.0, 93.0, 100.0, 97.0, 131.0, 73.0, 91.0, 16.0]
2025-05-08 06:18:25,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 83/100 (estimated time remaining: 46 minutes, 39 seconds)
2025-05-08 06:20:59,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:21:00,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 279.74469 ± 109.831
2025-05-08 06:21:00,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [365.39145, 206.15907, 17.73633, 330.66168, 163.62154, 360.5963, 378.72916, 303.42462, 339.19946, 331.92746]
2025-05-08 06:21:00,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [184.0, 100.0, 17.0, 158.0, 97.0, 165.0, 157.0, 138.0, 130.0, 128.0]
2025-05-08 06:21:00,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (279.74) for latency ExtremeSparseL4U32
2025-05-08 06:21:00,479 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 06:21:00,483 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 06:21:00,496 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 84/100 (estimated time remaining: 44 minutes, 6 seconds)
2025-05-08 06:23:33,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:23:34,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 129.37450 ± 149.411
2025-05-08 06:23:34,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [18.224842, 324.45782, 9.760192, 16.585966, 11.735044, 219.15926, 148.6011, 36.756363, 458.89, 49.57433]
2025-05-08 06:23:34,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [17.0, 141.0, 13.0, 16.0, 14.0, 115.0, 86.0, 36.0, 205.0, 36.0]
2025-05-08 06:23:34,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 85/100 (estimated time remaining: 41 minutes, 31 seconds)
2025-05-08 06:26:07,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:26:07,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 104.72084 ± 115.651
2025-05-08 06:26:07,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [19.138418, 278.1496, 89.09782, 14.330432, 329.0028, 19.186207, 214.79474, 10.238315, 35.57973, 37.69031]
2025-05-08 06:26:07,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [18.0, 106.0, 59.0, 16.0, 166.0, 18.0, 110.0, 15.0, 30.0, 42.0]
2025-05-08 06:26:07,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 86/100 (estimated time remaining: 38 minutes, 56 seconds)
2025-05-08 06:28:41,586 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:28:42,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 111.11033 ± 98.731
2025-05-08 06:28:42,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [16.088219, 103.67897, 36.269524, 70.50023, 39.321953, 231.02258, 108.95336, 41.25942, 114.750946, 349.25812]
2025-05-08 06:28:42,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [16.0, 65.0, 31.0, 41.0, 31.0, 109.0, 75.0, 31.0, 62.0, 144.0]
2025-05-08 06:28:42,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 87/100 (estimated time remaining: 35 minutes, 55 seconds)
2025-05-08 06:31:15,738 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:31:16,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 158.29691 ± 144.813
2025-05-08 06:31:16,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [36.239075, 407.4718, 319.02588, 17.432451, 252.87437, 17.50589, 12.55936, 18.389357, 237.02634, 264.4446]
2025-05-08 06:31:16,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [39.0, 180.0, 162.0, 17.0, 116.0, 17.0, 14.0, 18.0, 111.0, 122.0]
2025-05-08 06:31:16,552 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 88/100 (estimated time remaining: 33 minutes, 24 seconds)
2025-05-08 06:33:49,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:33:49,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 116.79423 ± 126.175
2025-05-08 06:33:49,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [19.33435, 12.463733, 237.59427, 276.5436, 9.693367, 258.7, 307.21194, 14.101286, 20.325134, 11.97444]
2025-05-08 06:33:49,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [18.0, 14.0, 116.0, 134.0, 14.0, 112.0, 136.0, 15.0, 19.0, 14.0]
2025-05-08 06:33:49,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 89/100 (estimated time remaining: 30 minutes, 46 seconds)
2025-05-08 06:36:22,963 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:36:24,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 204.88049 ± 123.935
2025-05-08 06:36:24,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [62.15449, 30.762987, 336.7518, 365.48428, 168.0036, 277.40225, 224.19847, 18.949392, 227.32025, 337.77725]
2025-05-08 06:36:24,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [45.0, 35.0, 165.0, 149.0, 88.0, 122.0, 127.0, 18.0, 99.0, 144.0]
2025-05-08 06:36:24,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 90/100 (estimated time remaining: 28 minutes, 13 seconds)
2025-05-08 06:38:57,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:38:58,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 174.23392 ± 135.516
2025-05-08 06:38:58,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [45.974506, 24.263721, 325.41257, 15.452553, 355.4363, 16.480278, 198.88339, 143.16927, 283.96082, 333.30585]
2025-05-08 06:38:58,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [42.0, 21.0, 152.0, 16.0, 166.0, 17.0, 113.0, 80.0, 141.0, 138.0]
2025-05-08 06:38:58,049 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 91/100 (estimated time remaining: 25 minutes, 40 seconds)
2025-05-08 06:41:31,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:41:32,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 121.39388 ± 101.268
2025-05-08 06:41:32,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [14.966474, 101.762535, 167.00642, 21.034443, 39.940777, 231.01077, 359.6318, 93.15849, 113.1639, 72.263214]
2025-05-08 06:41:32,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [15.0, 63.0, 112.0, 43.0, 36.0, 109.0, 173.0, 60.0, 62.0, 55.0]
2025-05-08 06:41:32,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 6 seconds)
2025-05-08 06:44:05,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:44:05,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 92.64656 ± 143.942
2025-05-08 06:44:05,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [23.393814, 11.588033, 9.591727, 13.254549, 70.329994, 327.98965, 422.8187, 11.606236, 18.116129, 17.776707]
2025-05-08 06:44:05,747 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [21.0, 14.0, 13.0, 15.0, 54.0, 147.0, 181.0, 14.0, 18.0, 19.0]
2025-05-08 06:44:05,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 30 seconds)
2025-05-08 06:46:40,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:46:41,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 224.58377 ± 103.749
2025-05-08 06:46:41,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [225.75896, 31.402029, 268.89053, 193.11588, 48.72622, 268.51828, 319.7514, 215.27266, 337.22876, 337.17297]
2025-05-08 06:46:41,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [128.0, 36.0, 109.0, 106.0, 49.0, 140.0, 145.0, 107.0, 127.0, 148.0]
2025-05-08 06:46:41,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes)
2025-05-08 06:49:12,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:49:13,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 259.98068 ± 81.192
2025-05-08 06:49:13,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [311.90576, 327.36063, 195.17729, 382.3299, 207.32095, 297.7412, 310.40265, 280.5821, 194.54027, 92.446236]
2025-05-08 06:49:13,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 125.0, 119.0, 171.0, 98.0, 128.0, 138.0, 111.0, 113.0, 84.0]
2025-05-08 06:49:13,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 23 seconds)
2025-05-08 06:51:46,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:51:47,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 108.84599 ± 84.293
2025-05-08 06:51:47,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [66.2437, 110.780365, 295.10568, 56.841713, 72.36755, 13.01795, 82.808304, 240.75551, 53.010883, 97.52838]
2025-05-08 06:51:47,614 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [59.0, 72.0, 172.0, 39.0, 52.0, 14.0, 56.0, 152.0, 45.0, 72.0]
2025-05-08 06:51:47,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 49 seconds)
2025-05-08 06:54:20,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:54:21,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 121.36249 ± 129.843
2025-05-08 06:54:21,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [299.01993, 61.089775, 18.75451, 17.961391, 409.46884, 14.515368, 116.36078, 187.37453, 16.537249, 72.54243]
2025-05-08 06:54:21,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 42.0, 17.0, 17.0, 157.0, 15.0, 62.0, 128.0, 16.0, 44.0]
2025-05-08 06:54:21,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 14 seconds)
2025-05-08 06:56:54,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:56:54,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 131.82910 ± 111.397
2025-05-08 06:56:54,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [34.07933, 13.400484, 10.581528, 211.87468, 77.782295, 376.32797, 152.72795, 118.07355, 77.19134, 246.25192]
2025-05-08 06:56:54,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [36.0, 15.0, 14.0, 105.0, 61.0, 163.0, 98.0, 82.0, 66.0, 120.0]
2025-05-08 06:56:54,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 41 seconds)
2025-05-08 06:59:27,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:59:27,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 94.91929 ± 89.813
2025-05-08 06:59:27,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [41.533607, 21.296955, 213.4264, 309.43774, 46.5525, 39.39624, 92.672424, 40.599724, 110.32417, 33.953094]
2025-05-08 06:59:27,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [33.0, 19.0, 105.0, 142.0, 44.0, 41.0, 63.0, 32.0, 67.0, 29.0]
2025-05-08 06:59:27,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 6 seconds)
2025-05-08 07:02:01,801 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:02:02,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 125.24410 ± 122.918
2025-05-08 07:02:02,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [321.1481, 245.04335, 18.807444, 20.28871, 203.03131, 306.6895, 14.064073, 11.804924, 22.907206, 88.656395]
2025-05-08 07:02:02,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [158.0, 120.0, 18.0, 19.0, 98.0, 126.0, 17.0, 15.0, 21.0, 58.0]
2025-05-08 07:02:02,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 33 seconds)
2025-05-08 07:04:34,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:04:34,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 114.11311 ± 109.109
2025-05-08 07:04:34,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [227.79066, 355.2458, 47.490524, 12.851291, 8.793903, 216.6441, 62.889057, 101.93697, 82.22564, 25.26314]
2025-05-08 07:04:34,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [109.0, 140.0, 36.0, 15.0, 13.0, 118.0, 58.0, 63.0, 63.0, 27.0]
2025-05-08 07:04:34,911 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1149 [DEBUG]: Training session finished
