2025-05-08 04:52:29,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4
2025-05-08 04:52:29,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4
2025-05-08 04:52:29,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7c447abc3f10>}
2025-05-08 04:52:29,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1009 [DEBUG]: using device: cpu
2025-05-08 04:52:29,024 baseline-sac-noisy-hopper:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 32
2025-05-08 04:52:29,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1031 [INFO]: Creating new trainer
2025-05-08 04:52:29,042 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-08 04:52:29,042 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=26, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-08 04:52:29,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1092 [DEBUG]: Starting training session...
2025-05-08 04:52:29,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 1/100
2025-05-08 04:55:01,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:55:02,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 56.43426 ± 11.419
2025-05-08 04:55:02,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [84.05939, 44.720673, 49.667816, 49.749527, 47.840233, 49.439022, 51.4996, 57.35277, 62.003323, 68.0102]
2025-05-08 04:55:02,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [67.0, 26.0, 30.0, 30.0, 29.0, 30.0, 31.0, 36.0, 42.0, 40.0]
2025-05-08 04:55:02,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (56.43) for latency ExtremeSparseL4U32
2025-05-08 04:55:02,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:55:02,062 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:55:02,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 12 minutes, 7 seconds)
2025-05-08 04:57:38,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 04:57:40,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 144.02937 ± 118.428
2025-05-08 04:57:40,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [39.081394, 39.35595, 108.077965, 43.552193, 271.37747, 132.26898, 260.2466, 34.620567, 397.36737, 114.34524]
2025-05-08 04:57:40,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [45.0, 39.0, 116.0, 49.0, 296.0, 148.0, 260.0, 40.0, 414.0, 120.0]
2025-05-08 04:57:40,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (144.03) for latency ExtremeSparseL4U32
2025-05-08 04:57:40,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 04:57:40,659 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 04:57:40,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 14 minutes, 18 seconds)
2025-05-08 05:00:20,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:00:23,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 263.63495 ± 119.163
2025-05-08 05:00:23,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [343.43475, 108.13355, 246.95303, 316.80002, 366.882, 422.88046, 194.54857, 267.31213, 350.65372, 18.751215]
2025-05-08 05:00:23,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [240.0, 121.0, 181.0, 208.0, 266.0, 322.0, 151.0, 234.0, 228.0, 22.0]
2025-05-08 05:00:23,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (263.63) for latency ExtremeSparseL4U32
2025-05-08 05:00:23,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:00:23,161 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:00:23,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 15 minutes, 22 seconds)
2025-05-08 05:03:00,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:03:01,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 230.20195 ± 102.777
2025-05-08 05:03:01,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [304.403, 66.92231, 21.467484, 261.37857, 174.13022, 297.32788, 316.10623, 266.3088, 339.81516, 254.15991]
2025-05-08 05:03:01,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [174.0, 46.0, 27.0, 127.0, 97.0, 167.0, 160.0, 130.0, 175.0, 121.0]
2025-05-08 05:03:01,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 13 minutes, 4 seconds)
2025-05-08 05:05:36,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:05:38,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 220.90752 ± 106.020
2025-05-08 05:05:38,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.68866, 318.89456, 326.67706, 132.76117, 325.4588, 314.94028, 180.14998, 142.0492, 98.82772, 45.62803]
2025-05-08 05:05:38,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [144.0, 142.0, 154.0, 82.0, 151.0, 144.0, 99.0, 97.0, 65.0, 61.0]
2025-05-08 05:05:38,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 9 minutes, 49 seconds)
2025-05-08 05:08:11,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:08:12,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 231.53496 ± 146.131
2025-05-08 05:08:12,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [342.25778, 335.36337, 28.655548, 395.38287, 72.56314, 352.40042, 327.54767, 84.80776, 33.08829, 343.2828]
2025-05-08 05:08:12,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [168.0, 162.0, 44.0, 216.0, 49.0, 189.0, 162.0, 72.0, 47.0, 169.0]
2025-05-08 05:08:12,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 7 minutes, 42 seconds)
2025-05-08 05:10:46,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:10:47,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 209.14548 ± 115.049
2025-05-08 05:10:47,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [348.3943, 324.6052, 133.31407, 126.14572, 123.88324, 77.968185, 30.706806, 284.91113, 324.5315, 316.99457]
2025-05-08 05:10:47,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [185.0, 147.0, 73.0, 74.0, 72.0, 50.0, 33.0, 138.0, 150.0, 145.0]
2025-05-08 05:10:47,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 3 minutes, 51 seconds)
2025-05-08 05:13:20,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:13:21,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 268.60571 ± 107.562
2025-05-08 05:13:21,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [30.657446, 336.04117, 299.95676, 305.38965, 84.96581, 352.83588, 347.92725, 316.2921, 308.5705, 303.42056]
2025-05-08 05:13:21,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [36.0, 156.0, 134.0, 133.0, 51.0, 210.0, 164.0, 139.0, 133.0, 134.0]
2025-05-08 05:13:21,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (268.61) for latency ExtremeSparseL4U32
2025-05-08 05:13:21,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:13:21,809 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:13:21,817 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 58 minutes, 47 seconds)
2025-05-08 05:16:00,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:16:02,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 265.21295 ± 89.983
2025-05-08 05:16:02,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [309.23618, 314.38516, 305.1215, 313.43515, 98.89789, 298.77063, 72.91858, 317.18463, 314.2915, 307.8885]
2025-05-08 05:16:02,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [138.0, 137.0, 136.0, 139.0, 56.0, 159.0, 50.0, 141.0, 143.0, 135.0]
2025-05-08 05:16:02,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 56 minutes, 41 seconds)
2025-05-08 05:18:37,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:18:39,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 253.89690 ± 116.801
2025-05-08 05:18:39,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [246.14, 317.9192, 321.6781, 323.86816, 13.313029, 324.28522, 318.25247, 319.13474, 318.211, 36.167236]
2025-05-08 05:18:39,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [126.0, 134.0, 140.0, 139.0, 13.0, 138.0, 132.0, 136.0, 132.0, 40.0]
2025-05-08 05:18:39,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 54 minutes, 16 seconds)
2025-05-08 05:21:16,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:21:17,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 267.79407 ± 78.274
2025-05-08 05:21:17,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [208.58322, 317.3134, 312.19806, 104.309654, 315.5468, 308.80136, 320.20004, 310.96677, 149.19333, 330.8281]
2025-05-08 05:21:17,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [106.0, 132.0, 133.0, 59.0, 133.0, 130.0, 133.0, 132.0, 78.0, 149.0]
2025-05-08 05:21:17,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 52 minutes, 53 seconds)
2025-05-08 05:23:55,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:23:56,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 293.41864 ± 89.009
2025-05-08 05:23:56,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [325.93954, 319.79376, 346.0485, 311.73645, 309.19788, 317.14978, 131.83086, 425.4748, 324.08423, 122.93056]
2025-05-08 05:23:56,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [143.0, 135.0, 165.0, 135.0, 138.0, 145.0, 77.0, 250.0, 142.0, 74.0]
2025-05-08 05:23:56,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (293.42) for latency ExtremeSparseL4U32
2025-05-08 05:23:56,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:23:56,580 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:23:56,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 51 minutes, 31 seconds)
2025-05-08 05:26:33,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:26:34,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 301.91595 ± 92.638
2025-05-08 05:26:34,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [347.47116, 324.3995, 322.3234, 339.28958, 344.5873, 25.099087, 327.3545, 334.50403, 326.04367, 328.08725]
2025-05-08 05:26:34,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [156.0, 135.0, 149.0, 147.0, 156.0, 28.0, 139.0, 143.0, 139.0, 138.0]
2025-05-08 05:26:34,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (301.92) for latency ExtremeSparseL4U32
2025-05-08 05:26:34,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:26:34,940 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:26:34,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 50 minutes)
2025-05-08 05:29:12,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:29:13,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 230.56055 ± 135.698
2025-05-08 05:29:13,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.42993, 38.48093, 141.97223, 344.85788, 328.3111, 329.3852, 47.942142, 44.181435, 347.94193, 359.10275]
2025-05-08 05:29:13,523 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 29.0, 95.0, 158.0, 145.0, 184.0, 29.0, 49.0, 155.0, 173.0]
2025-05-08 05:29:13,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 46 minutes, 49 seconds)
2025-05-08 05:31:50,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:31:51,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 301.64145 ± 67.922
2025-05-08 05:31:51,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [317.29684, 323.75366, 196.82938, 345.0966, 327.311, 146.66907, 376.08716, 339.6414, 326.6757, 317.0536]
2025-05-08 05:31:51,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [131.0, 134.0, 114.0, 163.0, 141.0, 80.0, 188.0, 154.0, 136.0, 131.0]
2025-05-08 05:31:51,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 44 minutes, 36 seconds)
2025-05-08 05:34:30,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:34:31,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 275.99991 ± 93.045
2025-05-08 05:34:31,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [56.903004, 362.83746, 316.01288, 243.02861, 316.15298, 315.69907, 348.68784, 327.3494, 152.58559, 320.74188]
2025-05-08 05:34:31,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [38.0, 180.0, 137.0, 122.0, 140.0, 151.0, 167.0, 146.0, 79.0, 140.0]
2025-05-08 05:34:31,742 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 42 minutes, 21 seconds)
2025-05-08 05:37:07,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:37:09,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 312.44617 ± 11.448
2025-05-08 05:37:09,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [319.59213, 320.36087, 315.40466, 315.4216, 316.5128, 315.92947, 314.26654, 310.54742, 278.9958, 317.43008]
2025-05-08 05:37:09,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 134.0, 140.0, 133.0, 132.0, 133.0, 129.0, 128.0, 124.0, 133.0]
2025-05-08 05:37:09,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (312.45) for latency ExtremeSparseL4U32
2025-05-08 05:37:09,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 05:37:09,084 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 05:37:09,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 39 minutes, 15 seconds)
2025-05-08 05:39:46,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:39:47,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 304.90445 ± 60.550
2025-05-08 05:39:47,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [331.47385, 124.27595, 336.55334, 331.58948, 321.63492, 321.48077, 319.60312, 318.12875, 328.3348, 315.96945]
2025-05-08 05:39:47,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [161.0, 75.0, 169.0, 155.0, 138.0, 137.0, 133.0, 146.0, 140.0, 132.0]
2025-05-08 05:39:47,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 36 minutes, 42 seconds)
2025-05-08 05:42:26,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:42:27,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 281.81186 ± 95.836
2025-05-08 05:42:27,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [358.77017, 323.32205, 325.38907, 318.64575, 289.8429, 352.65585, 327.28235, 320.3168, 159.62227, 42.2715]
2025-05-08 05:42:27,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [176.0, 136.0, 140.0, 142.0, 132.0, 170.0, 143.0, 136.0, 90.0, 38.0]
2025-05-08 05:42:27,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 34 minutes, 21 seconds)
2025-05-08 05:45:05,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:45:06,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 151.72269 ± 123.977
2025-05-08 05:45:06,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [285.0864, 20.418365, 341.9709, 339.6312, 101.29282, 185.40407, 33.62115, 149.3858, 27.960012, 32.45617]
2025-05-08 05:45:06,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [147.0, 25.0, 156.0, 158.0, 71.0, 129.0, 40.0, 83.0, 33.0, 33.0]
2025-05-08 05:45:06,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 31 minutes, 54 seconds)
2025-05-08 05:47:43,133 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:47:44,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 219.09915 ± 138.073
2025-05-08 05:47:44,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [51.003, 270.8179, 53.16295, 387.45596, 338.10452, 320.04398, 339.88605, 85.2254, 26.610071, 318.68158]
2025-05-08 05:47:44,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [52.0, 148.0, 56.0, 252.0, 160.0, 136.0, 150.0, 53.0, 38.0, 135.0]
2025-05-08 05:47:44,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 28 minutes, 45 seconds)
2025-05-08 05:50:20,550 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:50:21,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 287.77725 ± 68.867
2025-05-08 05:50:21,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [321.81183, 193.53189, 334.15683, 119.678345, 335.38818, 314.42548, 318.0137, 333.66003, 319.10294, 288.0031]
2025-05-08 05:50:21,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 93.0, 149.0, 67.0, 151.0, 132.0, 133.0, 149.0, 133.0, 127.0]
2025-05-08 05:50:21,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 26 minutes, 8 seconds)
2025-05-08 05:53:00,447 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:53:01,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 309.46909 ± 51.279
2025-05-08 05:53:01,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [299.77127, 300.91528, 326.4152, 344.36267, 324.8133, 330.9635, 340.24847, 335.29236, 330.34476, 161.56424]
2025-05-08 05:53:01,995 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [126.0, 131.0, 146.0, 156.0, 143.0, 148.0, 148.0, 158.0, 143.0, 84.0]
2025-05-08 05:53:01,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 23 minutes, 50 seconds)
2025-05-08 05:55:39,218 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:55:40,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 232.62845 ± 119.595
2025-05-08 05:55:40,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [44.954, 162.73253, 31.054014, 342.19684, 133.3364, 330.59875, 324.19086, 317.4189, 316.56165, 323.24045]
2025-05-08 05:55:40,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [26.0, 97.0, 42.0, 161.0, 67.0, 145.0, 138.0, 132.0, 132.0, 134.0]
2025-05-08 05:55:40,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 20 minutes, 52 seconds)
2025-05-08 05:58:17,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 05:58:19,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 235.27451 ± 113.740
2025-05-08 05:58:19,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [40.61911, 76.44315, 91.51865, 315.92767, 315.93484, 322.48727, 314.16632, 267.8497, 368.1739, 239.62454]
2025-05-08 05:58:19,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [47.0, 76.0, 60.0, 129.0, 129.0, 135.0, 129.0, 115.0, 187.0, 131.0]
2025-05-08 05:58:19,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 18 minutes, 12 seconds)
2025-05-08 06:00:55,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:00:57,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 269.49579 ± 113.281
2025-05-08 06:00:57,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [315.00427, 317.4522, 13.018612, 355.42014, 80.77524, 351.04657, 315.66907, 311.8126, 320.23218, 314.52716]
2025-05-08 06:00:57,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [131.0, 148.0, 32.0, 172.0, 51.0, 165.0, 130.0, 136.0, 133.0, 130.0]
2025-05-08 06:00:57,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 15 minutes, 31 seconds)
2025-05-08 06:03:34,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:03:36,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 290.88568 ± 84.097
2025-05-08 06:03:36,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [321.64563, 318.71204, 354.4065, 314.71747, 42.95876, 320.37503, 317.41357, 312.12766, 320.5955, 285.90488]
2025-05-08 06:03:36,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 134.0, 172.0, 130.0, 32.0, 138.0, 133.0, 131.0, 133.0, 124.0]
2025-05-08 06:03:36,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 13 minutes, 15 seconds)
2025-05-08 06:06:14,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:06:15,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 273.73285 ± 89.459
2025-05-08 06:06:15,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [321.49857, 320.94565, 258.19412, 37.46214, 335.14014, 350.83746, 317.00757, 285.91287, 314.43723, 195.89273]
2025-05-08 06:06:15,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [137.0, 133.0, 133.0, 29.0, 156.0, 163.0, 131.0, 126.0, 129.0, 95.0]
2025-05-08 06:06:15,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 10 minutes, 30 seconds)
2025-05-08 06:08:52,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:08:53,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 262.26758 ± 104.063
2025-05-08 06:08:53,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [326.84943, 160.66328, 329.3264, 338.8246, 319.11722, 336.82837, 317.86798, 40.214848, 325.7126, 127.27111]
2025-05-08 06:08:53,669 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [139.0, 84.0, 145.0, 150.0, 134.0, 151.0, 134.0, 39.0, 174.0, 76.0]
2025-05-08 06:08:53,673 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 7 minutes, 44 seconds)
2025-05-08 06:11:31,044 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:11:32,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 251.70451 ± 109.127
2025-05-08 06:11:32,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [104.13571, 333.7192, 321.4741, 323.90518, 318.78503, 318.77325, 109.66476, 46.72248, 318.8812, 320.98444]
2025-05-08 06:11:32,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [79.0, 147.0, 136.0, 139.0, 132.0, 132.0, 67.0, 55.0, 132.0, 134.0]
2025-05-08 06:11:32,243 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 5 minutes, 1 second)
2025-05-08 06:14:21,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:14:22,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 257.40997 ± 112.303
2025-05-08 06:14:22,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [293.70126, 325.4888, 316.8802, 99.74961, 384.2127, 143.20491, 38.368263, 350.29282, 322.9315, 299.26984]
2025-05-08 06:14:22,693 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [119.0, 136.0, 139.0, 55.0, 198.0, 72.0, 52.0, 166.0, 133.0, 124.0]
2025-05-08 06:14:22,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 5 minutes, 17 seconds)
2025-05-08 06:17:07,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:17:08,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 281.23193 ± 92.062
2025-05-08 06:17:08,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [336.36447, 330.74744, 313.39072, 225.63164, 186.36012, 326.49365, 301.1989, 337.90396, 61.857468, 392.371]
2025-05-08 06:17:08,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [152.0, 144.0, 145.0, 114.0, 97.0, 141.0, 140.0, 152.0, 46.0, 204.0]
2025-05-08 06:17:08,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 4 minutes, 12 seconds)
2025-05-08 06:19:50,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:19:51,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 280.58502 ± 88.397
2025-05-08 06:19:51,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [335.69308, 197.31918, 337.5892, 338.47076, 208.46468, 329.87485, 329.80948, 65.446556, 335.39465, 327.7877]
2025-05-08 06:19:51,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [147.0, 127.0, 152.0, 152.0, 104.0, 138.0, 136.0, 40.0, 147.0, 134.0]
2025-05-08 06:19:51,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 2 minutes, 10 seconds)
2025-05-08 06:22:31,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:22:33,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 282.25540 ± 86.151
2025-05-08 06:22:33,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [343.6129, 127.70255, 330.26013, 325.003, 330.53366, 333.27423, 383.2253, 134.17169, 211.87668, 302.8938]
2025-05-08 06:22:33,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [150.0, 67.0, 142.0, 135.0, 140.0, 142.0, 193.0, 70.0, 125.0, 135.0]
2025-05-08 06:22:33,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 21 seconds)
2025-05-08 06:25:16,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:25:18,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 329.46661 ± 20.516
2025-05-08 06:25:18,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [337.4205, 347.14178, 311.7236, 317.30142, 313.0472, 382.58777, 318.4561, 324.12976, 321.5836, 321.2747]
2025-05-08 06:25:18,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [148.0, 163.0, 130.0, 134.0, 130.0, 193.0, 131.0, 135.0, 136.0, 133.0]
2025-05-08 06:25:18,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (329.47) for latency ExtremeSparseL4U32
2025-05-08 06:25:18,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 06:25:18,628 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 06:25:18,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 59 minutes, 3 seconds)
2025-05-08 06:27:59,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:28:00,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 277.82056 ± 107.331
2025-05-08 06:28:00,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [324.27872, 323.21625, 310.99112, 373.80698, 107.0964, 30.025015, 343.66373, 321.95172, 322.63333, 320.5421]
2025-05-08 06:28:00,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 134.0, 129.0, 188.0, 58.0, 36.0, 155.0, 135.0, 133.0, 133.0]
2025-05-08 06:28:00,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 54 minutes, 28 seconds)
2025-05-08 06:30:42,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:30:44,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 220.19437 ± 130.906
2025-05-08 06:30:44,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [129.16855, 33.559723, 336.4413, 282.12613, 335.3841, 327.00287, 57.250164, 35.13776, 333.60803, 332.26498]
2025-05-08 06:30:44,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [73.0, 30.0, 152.0, 120.0, 148.0, 139.0, 41.0, 30.0, 147.0, 146.0]
2025-05-08 06:30:44,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 51 minutes, 10 seconds)
2025-05-08 06:33:22,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:33:24,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 298.97162 ± 70.557
2025-05-08 06:33:24,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [311.59598, 321.00275, 334.74408, 318.58865, 326.49387, 320.06323, 87.98403, 322.05325, 321.33075, 325.85938]
2025-05-08 06:33:24,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 134.0, 145.0, 132.0, 152.0, 134.0, 53.0, 134.0, 134.0, 138.0]
2025-05-08 06:33:24,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 48 minutes)
2025-05-08 06:36:07,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:36:08,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 306.95035 ± 89.193
2025-05-08 06:36:08,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [51.690228, 378.33356, 334.1624, 332.25897, 331.11356, 322.85968, 287.44318, 320.08542, 326.10306, 385.45337]
2025-05-08 06:36:08,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [54.0, 186.0, 148.0, 147.0, 145.0, 134.0, 118.0, 134.0, 135.0, 200.0]
2025-05-08 06:36:08,748 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 45 minutes, 46 seconds)
2025-05-08 06:38:50,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:38:51,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 305.16919 ± 77.554
2025-05-08 06:38:51,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [341.95233, 350.05884, 348.39664, 217.51097, 343.9079, 352.80817, 103.57645, 339.43787, 345.54822, 308.4945]
2025-05-08 06:38:51,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [152.0, 165.0, 164.0, 118.0, 158.0, 175.0, 61.0, 154.0, 160.0, 135.0]
2025-05-08 06:38:51,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 42 minutes, 37 seconds)
2025-05-08 06:41:32,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:41:33,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 293.39481 ± 80.748
2025-05-08 06:41:33,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [363.68225, 320.85275, 313.44492, 314.11627, 325.22763, 205.80945, 82.81729, 334.56708, 326.70316, 346.7272]
2025-05-08 06:41:33,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [176.0, 133.0, 130.0, 129.0, 135.0, 99.0, 59.0, 146.0, 137.0, 158.0]
2025-05-08 06:41:33,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 39 minutes, 58 seconds)
2025-05-08 06:44:15,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:44:16,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 282.88293 ± 89.669
2025-05-08 06:44:16,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [315.48892, 322.9899, 287.1381, 326.2987, 37.839657, 343.41464, 323.7717, 207.99081, 328.3966, 335.5003]
2025-05-08 06:44:16,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [130.0, 136.0, 125.0, 136.0, 49.0, 152.0, 135.0, 105.0, 134.0, 141.0]
2025-05-08 06:44:16,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 37 minutes, 7 seconds)
2025-05-08 06:47:28,590 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:47:30,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 257.20111 ± 118.059
2025-05-08 06:47:30,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [337.17963, 293.1431, 344.33853, 129.56972, 335.10245, 56.643112, 55.2248, 342.6395, 341.62418, 336.5461]
2025-05-08 06:47:30,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [148.0, 132.0, 151.0, 71.0, 151.0, 53.0, 62.0, 152.0, 145.0, 146.0]
2025-05-08 06:47:30,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 40 minutes, 42 seconds)
2025-05-08 06:50:07,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:50:08,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 273.40326 ± 85.050
2025-05-08 06:50:08,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [314.87964, 320.3266, 311.5957, 187.55023, 331.55426, 220.87498, 323.25485, 314.80856, 63.57257, 345.61542]
2025-05-08 06:50:08,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [132.0, 135.0, 133.0, 91.0, 146.0, 103.0, 136.0, 132.0, 45.0, 159.0]
2025-05-08 06:50:08,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 36 minutes, 45 seconds)
2025-05-08 06:52:45,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:52:46,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 290.33112 ± 105.564
2025-05-08 06:52:46,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [355.95578, 358.2311, 350.4791, 319.45453, 34.860806, 363.73944, 350.47168, 161.76048, 362.7835, 245.5748]
2025-05-08 06:52:46,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [167.0, 168.0, 160.0, 157.0, 39.0, 179.0, 159.0, 95.0, 173.0, 137.0]
2025-05-08 06:52:46,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 33 minutes, 4 seconds)
2025-05-08 06:55:23,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:55:24,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 193.37570 ± 125.650
2025-05-08 06:55:24,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [329.22394, 123.93904, 111.01837, 35.506496, 240.79027, 60.840683, 330.23227, 35.356472, 324.58447, 342.26486]
2025-05-08 06:55:24,217 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [139.0, 69.0, 73.0, 45.0, 125.0, 59.0, 143.0, 28.0, 136.0, 152.0]
2025-05-08 06:55:24,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 29 minutes, 27 seconds)
2025-05-08 06:57:58,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 06:58:00,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 222.68802 ± 121.526
2025-05-08 06:58:00,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [75.33934, 62.31915, 317.73383, 327.3908, 331.973, 333.9136, 156.1799, 257.5629, 28.88482, 335.58273]
2025-05-08 06:58:00,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [52.0, 61.0, 138.0, 134.0, 156.0, 143.0, 77.0, 135.0, 34.0, 142.0]
2025-05-08 06:58:00,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 25 minutes, 26 seconds)
2025-05-08 07:00:36,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:00:38,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 291.74329 ± 92.898
2025-05-08 07:00:38,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [32.458984, 297.52225, 238.90288, 379.31482, 331.6439, 331.42776, 343.0223, 324.72778, 328.82443, 309.58755]
2025-05-08 07:00:38,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [25.0, 161.0, 151.0, 196.0, 138.0, 141.0, 154.0, 136.0, 139.0, 131.0]
2025-05-08 07:00:38,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 16 minutes, 37 seconds)
2025-05-08 07:03:14,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:03:15,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 277.13205 ± 91.137
2025-05-08 07:03:15,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [344.26178, 140.51054, 332.54144, 324.8707, 92.59598, 354.66913, 331.40903, 321.15808, 332.75803, 196.54596]
2025-05-08 07:03:15,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [152.0, 74.0, 141.0, 132.0, 55.0, 159.0, 136.0, 130.0, 139.0, 118.0]
2025-05-08 07:03:15,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 13 minutes, 45 seconds)
2025-05-08 07:05:48,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:05:50,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 317.70605 ± 55.900
2025-05-08 07:05:50,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [151.60002, 349.36395, 340.64243, 330.14795, 331.00262, 348.70966, 337.62424, 334.01736, 327.5447, 326.40744]
2025-05-08 07:05:50,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [80.0, 153.0, 149.0, 141.0, 142.0, 152.0, 145.0, 138.0, 132.0, 133.0]
2025-05-08 07:05:50,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 10 minutes, 33 seconds)
2025-05-08 07:08:24,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:08:25,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 246.88168 ± 108.776
2025-05-08 07:08:25,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [327.18646, 331.74728, 320.16687, 321.0579, 142.4559, 36.373978, 346.53024, 120.97876, 182.44643, 339.87308]
2025-05-08 07:08:25,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 137.0, 132.0, 147.0, 79.0, 44.0, 153.0, 67.0, 119.0, 148.0]
2025-05-08 07:08:25,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 7 minutes, 34 seconds)
2025-05-08 07:10:58,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:10:59,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 260.54462 ± 107.337
2025-05-08 07:10:59,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [157.38348, 38.529884, 353.9673, 218.58769, 357.9159, 143.5312, 340.40244, 337.27994, 328.24146, 329.6066]
2025-05-08 07:10:59,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 45.0, 166.0, 114.0, 172.0, 75.0, 151.0, 143.0, 135.0, 134.0]
2025-05-08 07:10:59,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 4 minutes, 46 seconds)
2025-05-08 07:13:32,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:13:33,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 220.15141 ± 135.132
2025-05-08 07:13:33,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [320.7302, 329.2807, 307.26053, 332.95737, 94.99088, 60.960453, 21.184587, 350.14102, 48.49395, 335.5144]
2025-05-08 07:13:33,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [132.0, 138.0, 126.0, 144.0, 50.0, 49.0, 25.0, 163.0, 29.0, 146.0]
2025-05-08 07:13:33,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 1 minute, 24 seconds)
2025-05-08 07:16:07,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:16:09,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 252.16829 ± 149.348
2025-05-08 07:16:09,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [53.798294, 27.817207, 409.20996, 354.82202, 357.70044, 186.35606, 364.4945, 39.762096, 366.37857, 361.34384]
2025-05-08 07:16:09,349 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [36.0, 33.0, 216.0, 166.0, 166.0, 112.0, 171.0, 30.0, 173.0, 171.0]
2025-05-08 07:16:09,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 58 minutes, 40 seconds)
2025-05-08 07:18:43,410 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:18:44,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 258.40146 ± 108.197
2025-05-08 07:18:44,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [345.40594, 324.75592, 323.43427, 319.93106, 320.3343, 328.21075, 317.92914, 177.5824, 95.61256, 30.81815]
2025-05-08 07:18:44,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [155.0, 135.0, 135.0, 133.0, 133.0, 138.0, 132.0, 90.0, 84.0, 34.0]
2025-05-08 07:18:44,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 56 minutes, 11 seconds)
2025-05-08 07:21:20,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:21:21,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 270.63971 ± 90.732
2025-05-08 07:21:21,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [276.17456, 351.75662, 343.0149, 92.47604, 254.8728, 157.57643, 365.58893, 328.69913, 186.56898, 349.6685]
2025-05-08 07:21:21,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 159.0, 151.0, 58.0, 119.0, 98.0, 171.0, 135.0, 94.0, 156.0]
2025-05-08 07:21:21,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 53 minutes, 50 seconds)
2025-05-08 07:23:55,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:23:56,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 299.06107 ± 60.167
2025-05-08 07:23:56,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [212.92007, 350.20493, 321.65497, 323.6069, 323.12122, 316.45386, 321.03595, 154.39906, 347.28247, 319.93112]
2025-05-08 07:23:56,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [113.0, 161.0, 132.0, 135.0, 135.0, 133.0, 135.0, 83.0, 163.0, 133.0]
2025-05-08 07:23:56,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 51 minutes, 22 seconds)
2025-05-08 07:26:30,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:26:31,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 198.29237 ± 133.727
2025-05-08 07:26:31,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.61813, 329.91824, 86.62586, 326.43182, 129.74881, 24.909756, 44.713783, 50.110027, 339.85773, 326.98965]
2025-05-08 07:26:31,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 136.0, 52.0, 136.0, 66.0, 29.0, 45.0, 57.0, 154.0, 134.0]
2025-05-08 07:26:31,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 48 minutes, 53 seconds)
2025-05-08 07:29:05,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:29:07,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 251.84265 ± 121.314
2025-05-08 07:29:07,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [187.66516, 344.84183, 344.13687, 194.67244, 33.954945, 351.613, 334.81458, 340.72693, 340.44058, 45.560146]
2025-05-08 07:29:07,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [105.0, 158.0, 151.0, 106.0, 37.0, 160.0, 145.0, 149.0, 147.0, 28.0]
2025-05-08 07:29:07,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 46 minutes, 18 seconds)
2025-05-08 07:31:39,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:31:40,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 327.67242 ± 7.433
2025-05-08 07:31:40,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [344.3335, 324.07358, 329.81573, 330.0073, 324.11017, 321.17078, 320.61713, 317.63474, 332.75504, 332.20654]
2025-05-08 07:31:40,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [153.0, 133.0, 140.0, 139.0, 132.0, 134.0, 133.0, 129.0, 138.0, 138.0]
2025-05-08 07:31:40,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2025-05-08 07:34:14,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:34:15,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 268.16791 ± 113.808
2025-05-08 07:34:15,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [311.57007, 43.20055, 331.2544, 329.86832, 331.32278, 38.61172, 325.89804, 328.7871, 315.3755, 325.79047]
2025-05-08 07:34:15,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [129.0, 30.0, 145.0, 137.0, 143.0, 57.0, 136.0, 136.0, 131.0, 138.0]
2025-05-08 07:34:15,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 40 minutes, 39 seconds)
2025-05-08 07:36:47,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:36:48,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 270.65210 ± 96.751
2025-05-08 07:36:48,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [331.75662, 330.42078, 323.07004, 329.14468, 330.0557, 86.30049, 98.80125, 204.75816, 345.4665, 326.74695]
2025-05-08 07:36:48,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 135.0, 134.0, 136.0, 138.0, 53.0, 74.0, 114.0, 155.0, 135.0]
2025-05-08 07:36:48,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 37 minutes, 45 seconds)
2025-05-08 07:39:22,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:39:24,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 289.06729 ± 90.961
2025-05-08 07:39:24,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [348.75375, 325.84854, 293.33566, 337.71884, 346.14874, 339.36948, 333.51587, 113.83953, 105.58975, 346.55273]
2025-05-08 07:39:24,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [155.0, 136.0, 131.0, 142.0, 159.0, 148.0, 140.0, 64.0, 67.0, 153.0]
2025-05-08 07:39:24,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 35 minutes, 19 seconds)
2025-05-08 07:41:57,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:41:59,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 238.11739 ± 117.631
2025-05-08 07:41:59,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [84.71562, 42.681595, 346.03683, 327.1719, 308.50378, 336.58792, 108.98525, 155.79428, 340.55527, 330.14124]
2025-05-08 07:41:59,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [93.0, 41.0, 155.0, 138.0, 126.0, 144.0, 59.0, 102.0, 149.0, 140.0]
2025-05-08 07:41:59,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 32 minutes, 36 seconds)
2025-05-08 07:44:31,706 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:44:33,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 316.03619 ± 34.804
2025-05-08 07:44:33,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [321.5571, 322.59933, 315.49277, 324.54553, 335.73758, 215.04605, 336.03976, 346.1047, 324.17102, 319.0682]
2025-05-08 07:44:33,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [131.0, 131.0, 131.0, 138.0, 150.0, 110.0, 136.0, 154.0, 134.0, 132.0]
2025-05-08 07:44:33,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 30 minutes, 6 seconds)
2025-05-08 07:47:08,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:47:09,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 248.32356 ± 118.178
2025-05-08 07:47:09,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [328.4398, 200.98836, 34.819103, 35.501614, 350.9358, 333.67615, 211.45306, 352.83008, 333.3357, 301.25616]
2025-05-08 07:47:09,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [140.0, 98.0, 38.0, 52.0, 161.0, 139.0, 102.0, 161.0, 138.0, 129.0]
2025-05-08 07:47:09,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 27 minutes, 39 seconds)
2025-05-08 07:49:42,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:49:43,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 292.70700 ± 83.266
2025-05-08 07:49:43,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [334.90466, 355.72125, 334.7435, 158.13179, 369.6644, 334.33157, 339.33005, 126.78182, 342.72653, 230.73445]
2025-05-08 07:49:43,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [137.0, 160.0, 144.0, 84.0, 182.0, 143.0, 149.0, 71.0, 153.0, 112.0]
2025-05-08 07:49:43,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 25 minutes, 16 seconds)
2025-05-08 07:52:17,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:52:19,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 290.36206 ± 93.164
2025-05-08 07:52:19,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [324.51553, 324.5877, 324.3001, 324.17148, 27.651115, 350.50378, 229.45589, 349.3455, 323.4199, 325.66953]
2025-05-08 07:52:19,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [137.0, 135.0, 135.0, 133.0, 24.0, 165.0, 107.0, 168.0, 136.0, 136.0]
2025-05-08 07:52:19,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 22 minutes, 40 seconds)
2025-05-08 07:54:52,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:54:53,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 296.06570 ± 77.361
2025-05-08 07:54:53,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [334.8649, 334.0724, 252.65231, 355.39374, 343.94757, 199.79373, 117.36041, 373.41476, 338.50507, 310.65204]
2025-05-08 07:54:53,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 143.0, 129.0, 162.0, 151.0, 103.0, 90.0, 186.0, 147.0, 139.0]
2025-05-08 07:54:53,898 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 20 minutes, 4 seconds)
2025-05-08 07:57:27,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 07:57:29,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 291.55417 ± 74.366
2025-05-08 07:57:29,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [339.4443, 326.36465, 97.94643, 204.93484, 319.85083, 327.47424, 324.9364, 315.29834, 321.2835, 338.0083]
2025-05-08 07:57:29,184 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [151.0, 134.0, 61.0, 102.0, 133.0, 133.0, 132.0, 131.0, 135.0, 148.0]
2025-05-08 07:57:29,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 17 minutes, 36 seconds)
2025-05-08 08:00:02,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:00:04,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 278.48254 ± 115.411
2025-05-08 08:00:04,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [331.9041, 151.50471, 374.41336, 333.61554, 330.04587, 334.04333, 326.45886, 161.17343, 29.103142, 412.5632]
2025-05-08 08:00:04,239 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [136.0, 78.0, 179.0, 137.0, 137.0, 135.0, 133.0, 83.0, 31.0, 220.0]
2025-05-08 08:00:04,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 14 minutes, 54 seconds)
2025-05-08 08:02:37,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:02:38,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 302.88315 ± 93.316
2025-05-08 08:02:38,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [29.292643, 296.13437, 332.96985, 381.11646, 339.54837, 339.6156, 325.72928, 324.2241, 328.80685, 331.3941]
2025-05-08 08:02:38,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [24.0, 184.0, 152.0, 196.0, 150.0, 146.0, 134.0, 135.0, 137.0, 136.0]
2025-05-08 08:02:38,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 12 minutes, 19 seconds)
2025-05-08 08:05:11,245 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:05:12,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 293.28458 ± 70.809
2025-05-08 08:05:12,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [328.87152, 160.81949, 346.31335, 333.1961, 144.40364, 324.57803, 315.79214, 325.23596, 326.74756, 326.88806]
2025-05-08 08:05:12,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [136.0, 84.0, 155.0, 142.0, 73.0, 133.0, 127.0, 135.0, 134.0, 135.0]
2025-05-08 08:05:12,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 9 minutes, 36 seconds)
2025-05-08 08:07:45,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:07:47,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 238.64177 ± 127.790
2025-05-08 08:07:47,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.5794, 327.36707, 320.66968, 307.30716, 322.61716, 326.122, 52.809372, 46.73865, 327.4845, 31.72293]
2025-05-08 08:07:47,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 137.0, 133.0, 126.0, 136.0, 137.0, 38.0, 52.0, 155.0, 35.0]
2025-05-08 08:07:47,082 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 7 minutes)
2025-05-08 08:10:21,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:10:22,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 280.84943 ± 91.828
2025-05-08 08:10:22,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [319.61697, 345.12204, 325.67737, 323.47232, 326.96237, 326.1081, 198.48654, 259.338, 37.57866, 346.13196]
2025-05-08 08:10:22,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [164.0, 151.0, 132.0, 134.0, 137.0, 135.0, 100.0, 172.0, 37.0, 155.0]
2025-05-08 08:10:22,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 4 minutes, 26 seconds)
2025-05-08 08:12:55,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:12:56,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 210.53442 ± 128.724
2025-05-08 08:12:56,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.72116, 31.006573, 323.72534, 115.80302, 355.27258, 331.04956, 78.770325, 338.95047, 161.14877, 45.896503]
2025-05-08 08:12:56,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 40.0, 134.0, 61.0, 165.0, 135.0, 49.0, 151.0, 84.0, 47.0]
2025-05-08 08:12:56,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 1 minute, 46 seconds)
2025-05-08 08:15:30,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:15:32,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 296.02347 ± 88.287
2025-05-08 08:15:32,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.72903, 323.23114, 317.0731, 324.74677, 326.80093, 326.37112, 306.88397, 341.4, 32.564392, 337.43414]
2025-05-08 08:15:32,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 133.0, 130.0, 135.0, 136.0, 134.0, 128.0, 158.0, 35.0, 149.0]
2025-05-08 08:15:32,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 78/100 (estimated time remaining: 59 minutes, 18 seconds)
2025-05-08 08:18:07,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:18:08,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 255.15292 ± 124.655
2025-05-08 08:18:08,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [325.30658, 321.58188, 26.957544, 367.96017, 57.656418, 324.76025, 121.50808, 349.10364, 327.32935, 329.36514]
2025-05-08 08:18:08,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 134.0, 22.0, 181.0, 41.0, 135.0, 63.0, 159.0, 136.0, 141.0]
2025-05-08 08:18:08,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 79/100 (estimated time remaining: 56 minutes, 54 seconds)
2025-05-08 08:20:43,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:20:44,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 206.80350 ± 128.938
2025-05-08 08:20:44,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [333.45816, 328.61874, 84.28879, 264.53845, 39.320297, 375.07272, 295.177, 25.053137, 72.744194, 249.7634]
2025-05-08 08:20:44,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 136.0, 53.0, 130.0, 44.0, 185.0, 133.0, 26.0, 74.0, 129.0]
2025-05-08 08:20:44,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 80/100 (estimated time remaining: 54 minutes, 26 seconds)
2025-05-08 08:23:20,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:23:21,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 302.06213 ± 91.161
2025-05-08 08:23:21,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [352.8954, 330.49716, 315.8308, 329.01456, 326.09552, 329.68167, 30.18995, 332.51413, 346.3998, 327.50217]
2025-05-08 08:23:21,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [160.0, 134.0, 129.0, 134.0, 132.0, 136.0, 21.0, 143.0, 156.0, 133.0]
2025-05-08 08:23:21,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 81/100 (estimated time remaining: 51 minutes, 56 seconds)
2025-05-08 08:25:56,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:25:57,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 221.02649 ± 139.967
2025-05-08 08:25:57,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [364.49808, 329.64746, 326.37265, 333.537, 322.93134, 44.810158, 41.76215, 332.04636, 69.12257, 45.537228]
2025-05-08 08:25:57,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [173.0, 137.0, 139.0, 140.0, 135.0, 49.0, 52.0, 143.0, 49.0, 46.0]
2025-05-08 08:25:57,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 82/100 (estimated time remaining: 49 minutes, 27 seconds)
2025-05-08 08:28:33,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:28:34,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 248.13457 ± 115.415
2025-05-08 08:28:34,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [77.38555, 369.38416, 225.39879, 167.40036, 342.5128, 325.6861, 145.89616, 356.84195, 85.75473, 385.08514]
2025-05-08 08:28:34,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [46.0, 178.0, 110.0, 91.0, 152.0, 135.0, 77.0, 167.0, 51.0, 194.0]
2025-05-08 08:28:34,300 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 83/100 (estimated time remaining: 46 minutes, 55 seconds)
2025-05-08 08:31:09,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:31:11,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 313.82343 ± 75.911
2025-05-08 08:31:11,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [332.15646, 329.24362, 321.1247, 334.46524, 459.15262, 177.21968, 188.52042, 339.61023, 327.81467, 328.92667]
2025-05-08 08:31:11,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [144.0, 137.0, 134.0, 143.0, 263.0, 99.0, 97.0, 150.0, 135.0, 134.0]
2025-05-08 08:31:11,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 84/100 (estimated time remaining: 44 minutes, 20 seconds)
2025-05-08 08:33:44,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:33:46,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 289.78506 ± 61.882
2025-05-08 08:33:46,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.6041, 335.60767, 314.7936, 169.26308, 352.28247, 329.12595, 308.91522, 205.1422, 221.65138, 337.46487]
2025-05-08 08:33:46,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [134.0, 140.0, 130.0, 89.0, 176.0, 137.0, 127.0, 114.0, 133.0, 147.0]
2025-05-08 08:33:46,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 85/100 (estimated time remaining: 41 minutes, 40 seconds)
2025-05-08 08:36:20,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:36:22,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 330.95538 ± 7.927
2025-05-08 08:36:22,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [352.25912, 330.17566, 327.38745, 333.83334, 322.8874, 334.51413, 329.14578, 324.5982, 325.67322, 329.07938]
2025-05-08 08:36:22,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [164.0, 134.0, 137.0, 142.0, 132.0, 139.0, 135.0, 134.0, 134.0, 136.0]
2025-05-08 08:36:22,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (330.96) for latency ExtremeSparseL4U32
2025-05-08 08:36:22,256 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 08:36:22,260 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 08:36:22,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes, 2 seconds)
2025-05-08 08:38:58,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:38:59,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 313.44281 ± 73.308
2025-05-08 08:38:59,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [345.3268, 100.37611, 316.519, 327.69513, 381.8598, 324.56647, 324.79044, 328.4872, 354.16675, 330.64038]
2025-05-08 08:38:59,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [154.0, 60.0, 133.0, 137.0, 193.0, 135.0, 134.0, 135.0, 160.0, 141.0]
2025-05-08 08:38:59,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 87/100 (estimated time remaining: 36 minutes, 31 seconds)
2025-05-08 08:41:35,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:41:36,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 300.59329 ± 64.514
2025-05-08 08:41:36,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [330.4729, 152.84778, 340.35422, 335.60413, 335.97028, 329.31363, 325.52737, 324.66046, 193.60298, 337.5789]
2025-05-08 08:41:36,491 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [138.0, 80.0, 152.0, 142.0, 142.0, 138.0, 133.0, 135.0, 95.0, 141.0]
2025-05-08 08:41:36,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 88/100 (estimated time remaining: 33 minutes, 53 seconds)
2025-05-08 08:44:11,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:44:12,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 306.77695 ± 56.650
2025-05-08 08:44:12,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [327.18317, 332.98883, 323.4205, 326.09262, 314.41354, 322.06442, 137.4426, 330.1768, 328.89056, 325.09653]
2025-05-08 08:44:12,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 139.0, 133.0, 133.0, 128.0, 133.0, 73.0, 137.0, 133.0, 134.0]
2025-05-08 08:44:12,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 15 seconds)
2025-05-08 08:46:48,101 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:46:49,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 334.76779 ± 22.158
2025-05-08 08:46:49,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [329.2241, 345.6145, 304.272, 338.4658, 363.62564, 335.21832, 330.0868, 289.419, 358.62668, 353.12497]
2025-05-08 08:46:49,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [136.0, 157.0, 127.0, 149.0, 170.0, 142.0, 138.0, 136.0, 168.0, 160.0]
2025-05-08 08:46:49,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (334.77) for latency ExtremeSparseL4U32
2025-05-08 08:46:49,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-08 08:46:49,664 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 08:46:49,678 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 90/100 (estimated time remaining: 28 minutes, 43 seconds)
2025-05-08 08:49:24,840 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:49:25,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 213.18315 ± 126.764
2025-05-08 08:49:25,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [264.8484, 280.04373, 36.16583, 40.245014, 175.198, 19.996212, 330.75812, 327.4181, 324.11053, 333.0476]
2025-05-08 08:49:25,884 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [122.0, 145.0, 37.0, 39.0, 86.0, 27.0, 142.0, 136.0, 133.0, 135.0]
2025-05-08 08:49:25,894 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 7 seconds)
2025-05-08 08:52:01,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:52:02,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 231.97905 ± 136.668
2025-05-08 08:52:02,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [274.1301, 327.73688, 318.84918, 33.444496, 333.16013, 323.06815, 25.23152, 320.9137, 16.3791, 346.87723]
2025-05-08 08:52:02,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [127.0, 134.0, 131.0, 37.0, 150.0, 135.0, 22.0, 139.0, 34.0, 158.0]
2025-05-08 08:52:02,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 29 seconds)
2025-05-08 08:54:39,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:54:40,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 274.94287 ± 98.465
2025-05-08 08:54:40,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [323.90402, 324.2322, 322.594, 166.44191, 315.39944, 327.57352, 305.31778, 342.75342, 305.5365, 15.67581]
2025-05-08 08:54:40,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 133.0, 133.0, 82.0, 160.0, 135.0, 126.0, 153.0, 126.0, 33.0]
2025-05-08 08:54:40,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 54 seconds)
2025-05-08 08:57:16,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:57:17,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 235.96187 ± 130.023
2025-05-08 08:57:17,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [221.34305, 360.6276, 299.46613, 75.81337, 344.93295, 341.15784, 334.88342, 319.24808, 32.00449, 30.141888]
2025-05-08 08:57:17,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 168.0, 131.0, 84.0, 154.0, 145.0, 141.0, 130.0, 39.0, 35.0]
2025-05-08 08:57:17,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 19 seconds)
2025-05-08 08:59:52,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 08:59:53,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 289.86105 ± 82.845
2025-05-08 08:59:53,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [330.3165, 343.08386, 319.30655, 326.87576, 317.74445, 327.99152, 298.6378, 64.33299, 351.15405, 219.16699]
2025-05-08 08:59:53,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [137.0, 146.0, 132.0, 135.0, 130.0, 134.0, 125.0, 61.0, 159.0, 111.0]
2025-05-08 08:59:53,776 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 40 seconds)
2025-05-08 09:02:29,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:02:31,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 315.55475 ± 36.079
2025-05-08 09:02:31,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [330.21466, 330.1871, 321.46045, 332.59558, 325.62134, 323.87592, 330.36725, 330.41266, 322.94724, 207.86528]
2025-05-08 09:02:31,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [134.0, 137.0, 133.0, 139.0, 134.0, 136.0, 139.0, 137.0, 134.0, 126.0]
2025-05-08 09:02:31,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 5 seconds)
2025-05-08 09:05:05,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:05:06,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 215.10666 ± 118.442
2025-05-08 09:05:06,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [148.17233, 71.896225, 170.48544, 30.40381, 90.579414, 333.32654, 328.11728, 325.47684, 322.9335, 329.6754]
2025-05-08 09:05:06,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [78.0, 88.0, 95.0, 38.0, 50.0, 143.0, 134.0, 133.0, 132.0, 136.0]
2025-05-08 09:05:06,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 26 seconds)
2025-05-08 09:07:43,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:07:44,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 145.80571 ± 144.233
2025-05-08 09:07:44,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [34.031307, 408.3479, 333.81912, 142.62506, 23.322416, 32.096684, 39.335285, 58.01781, 334.25766, 52.20388]
2025-05-08 09:07:44,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [38.0, 213.0, 138.0, 76.0, 25.0, 28.0, 39.0, 55.0, 146.0, 48.0]
2025-05-08 09:07:44,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 50 seconds)
2025-05-08 09:10:18,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:10:19,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 304.21222 ± 74.357
2025-05-08 09:10:19,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [322.91418, 322.7874, 326.53525, 81.80155, 344.43246, 327.96198, 329.5925, 327.36557, 330.3657, 328.36563]
2025-05-08 09:10:19,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 134.0, 135.0, 51.0, 156.0, 135.0, 135.0, 134.0, 134.0, 135.0]
2025-05-08 09:10:19,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 12 seconds)
2025-05-08 09:12:54,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:12:56,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 290.78406 ± 106.595
2025-05-08 09:12:56,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [407.80856, 325.30423, 317.7474, 326.2028, 332.92612, 325.01315, 147.48726, 33.457413, 367.9935, 323.90036]
2025-05-08 09:12:56,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [216.0, 132.0, 131.0, 134.0, 146.0, 133.0, 78.0, 38.0, 183.0, 134.0]
2025-05-08 09:12:56,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 36 seconds)
2025-05-08 09:15:32,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:15:33,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 244.18167 ± 102.932
2025-05-08 09:15:33,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [75.15881, 330.21384, 326.6122, 103.782036, 254.04823, 142.06201, 332.02356, 330.27365, 184.20743, 363.43488]
2025-05-08 09:15:33,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [63.0, 139.0, 138.0, 60.0, 127.0, 80.0, 139.0, 142.0, 93.0, 174.0]
2025-05-08 09:15:33,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1149 [DEBUG]: Training session finished
