2025-05-06 13:23:12,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32
2025-05-06 13:23:12,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32
2025-05-06 13:23:12,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1008 [DEBUG]: args.trainer_eval_latencies: {'SparseU15': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x77ed393c4d00>}
2025-05-06 13:23:12,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1009 [DEBUG]: using device: cpu
2025-05-06 13:23:12,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1031 [INFO]: Creating new trainer
2025-05-06 13:23:12,198 baseline-sac-noisy-hopper:105 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-06 13:23:12,198 baseline-sac-noisy-hopper:106 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=110, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 13:23:12,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1092 [DEBUG]: Starting training session...
2025-05-06 13:23:12,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 1/100
2025-05-06 13:25:47,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:25:49,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 76.45604 ± 28.539
2025-05-06 13:25:49,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [76.35519, 145.36371, 64.36495, 80.25746, 81.311104, 80.60997, 23.54917, 77.23804, 56.714462, 78.79633]
2025-05-06 13:25:49,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [82.0, 108.0, 82.0, 72.0, 84.0, 77.0, 28.0, 79.0, 75.0, 84.0]
2025-05-06 13:25:49,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (76.46) for latency SparseU15
2025-05-06 13:25:49,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:25:49,629 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:25:49,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 19 minutes, 24 seconds)
2025-05-06 13:28:32,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:28:33,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 85.60816 ± 39.708
2025-05-06 13:28:33,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [167.69649, 82.0213, 96.113976, 72.56098, 44.672245, 65.27361, 56.43894, 148.24126, 42.654633, 80.40812]
2025-05-06 13:28:33,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [110.0, 63.0, 80.0, 68.0, 50.0, 62.0, 50.0, 99.0, 47.0, 66.0]
2025-05-06 13:28:33,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (85.61) for latency SparseU15
2025-05-06 13:28:33,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:28:33,646 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:28:33,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 22 minutes, 20 seconds)
2025-05-06 13:31:14,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:31:15,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 77.06392 ± 17.125
2025-05-06 13:31:15,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [89.79924, 28.204199, 77.78313, 72.3924, 84.07061, 83.11652, 88.83796, 87.75373, 81.79053, 76.89094]
2025-05-06 13:31:15,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [65.0, 31.0, 60.0, 53.0, 62.0, 63.0, 66.0, 59.0, 59.0, 58.0]
2025-05-06 13:31:15,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 20 minutes, 24 seconds)
2025-05-06 13:33:58,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:33:59,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 102.09590 ± 36.319
2025-05-06 13:33:59,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [105.591934, 91.448685, 92.71439, 142.7042, 137.33136, 23.68746, 87.50488, 78.73335, 102.75679, 158.48596]
2025-05-06 13:33:59,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [72.0, 67.0, 63.0, 90.0, 94.0, 25.0, 67.0, 57.0, 68.0, 94.0]
2025-05-06 13:33:59,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (102.10) for latency SparseU15
2025-05-06 13:33:59,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:33:59,671 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:33:59,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 18 minutes, 54 seconds)
2025-05-06 13:36:42,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:36:44,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 116.14011 ± 34.374
2025-05-06 13:36:44,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [82.18236, 120.677895, 88.42599, 116.88066, 156.92825, 142.02538, 117.60666, 93.705475, 62.420166, 180.54828]
2025-05-06 13:36:44,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [79.0, 92.0, 79.0, 92.0, 103.0, 107.0, 86.0, 86.0, 69.0, 154.0]
2025-05-06 13:36:44,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (116.14) for latency SparseU15
2025-05-06 13:36:44,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:36:44,313 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:36:44,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 17 minutes, 6 seconds)
2025-05-06 13:39:33,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:39:34,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 103.25980 ± 17.738
2025-05-06 13:39:34,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [66.13373, 93.39219, 119.94108, 93.34532, 130.64157, 117.351494, 87.65724, 103.333244, 110.76711, 110.03501]
2025-05-06 13:39:34,876 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [56.0, 67.0, 82.0, 71.0, 86.0, 83.0, 62.0, 67.0, 80.0, 80.0]
2025-05-06 13:39:34,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 18 minutes, 34 seconds)
2025-05-06 13:42:16,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:42:19,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 140.48235 ± 75.685
2025-05-06 13:42:19,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [255.55444, 24.186972, 147.04564, 106.16301, 275.47083, 176.27234, 106.55183, 141.21214, 123.22143, 49.14481]
2025-05-06 13:42:19,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 27.0, 86.0, 75.0, 166.0, 106.0, 76.0, 92.0, 97.0, 46.0]
2025-05-06 13:42:19,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (140.48) for latency SparseU15
2025-05-06 13:42:19,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:42:19,104 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:42:19,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 15 minutes, 53 seconds)
2025-05-06 13:45:00,663 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:45:02,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 110.88042 ± 42.350
2025-05-06 13:45:02,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [89.76203, 69.98055, 92.38159, 124.03055, 103.662506, 126.93241, 145.065, 161.93173, 23.838312, 171.21944]
2025-05-06 13:45:02,488 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [82.0, 61.0, 74.0, 95.0, 72.0, 80.0, 87.0, 103.0, 27.0, 100.0]
2025-05-06 13:45:02,490 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 13 minutes, 33 seconds)
2025-05-06 13:47:45,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:47:47,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 116.61522 ± 38.860
2025-05-06 13:47:47,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [141.84792, 26.777193, 137.24821, 114.35386, 93.55536, 189.04007, 112.90573, 127.49129, 111.35275, 111.57986]
2025-05-06 13:47:47,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [97.0, 28.0, 80.0, 82.0, 80.0, 119.0, 79.0, 88.0, 79.0, 78.0]
2025-05-06 13:47:47,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 10 minutes, 57 seconds)
2025-05-06 13:50:27,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:50:29,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 124.50700 ± 49.634
2025-05-06 13:50:29,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [131.57088, 114.93195, 122.962364, 29.697136, 92.103226, 177.52075, 156.71547, 74.31403, 214.1879, 131.06645]
2025-05-06 13:50:29,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [93.0, 80.0, 85.0, 28.0, 77.0, 116.0, 100.0, 67.0, 138.0, 93.0]
2025-05-06 13:50:29,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 7 minutes, 41 seconds)
2025-05-06 13:53:12,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:53:14,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 138.95903 ± 66.780
2025-05-06 13:53:14,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [219.0339, 28.866474, 142.67198, 174.01591, 179.58508, 109.677, 109.15596, 23.33242, 211.6573, 191.59422]
2025-05-06 13:53:14,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [122.0, 28.0, 99.0, 113.0, 106.0, 71.0, 90.0, 29.0, 118.0, 117.0]
2025-05-06 13:53:14,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 3 minutes, 14 seconds)
2025-05-06 13:55:55,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:55:57,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 131.83716 ± 67.745
2025-05-06 13:55:57,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [97.867905, 175.12863, 123.94377, 125.446594, 253.80093, 188.70331, 25.920681, 134.65562, 22.54996, 170.35414]
2025-05-06 13:55:57,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [83.0, 117.0, 88.0, 91.0, 135.0, 114.0, 30.0, 97.0, 27.0, 103.0]
2025-05-06 13:55:57,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 1 second)
2025-05-06 13:58:38,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 13:58:41,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 177.55891 ± 47.220
2025-05-06 13:58:41,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [158.9246, 204.0115, 119.36307, 222.69052, 101.87067, 222.70795, 214.90105, 119.330345, 177.01079, 234.77863]
2025-05-06 13:58:41,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [112.0, 132.0, 83.0, 115.0, 81.0, 145.0, 121.0, 85.0, 99.0, 130.0]
2025-05-06 13:58:41,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (177.56) for latency SparseU15
2025-05-06 13:58:41,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 13:58:41,378 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 13:58:41,403 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 57 minutes, 29 seconds)
2025-05-06 14:01:22,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:01:25,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 125.39548 ± 64.623
2025-05-06 14:01:25,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [25.54003, 236.02983, 127.72145, 167.29987, 119.63177, 110.50752, 125.26395, 211.0264, 26.620886, 104.31329]
2025-05-06 14:01:25,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [28.0, 128.0, 90.0, 113.0, 100.0, 87.0, 90.0, 134.0, 28.0, 78.0]
2025-05-06 14:01:25,037 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 54 minutes, 30 seconds)
2025-05-06 14:04:06,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:04:08,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 152.87338 ± 52.345
2025-05-06 14:04:08,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [101.141884, 269.24762, 105.559654, 195.8944, 148.25323, 167.28378, 140.43263, 137.07787, 79.10228, 184.74063]
2025-05-06 14:04:08,427 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [72.0, 136.0, 75.0, 129.0, 89.0, 105.0, 97.0, 91.0, 86.0, 106.0]
2025-05-06 14:04:08,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 51 minutes, 54 seconds)
2025-05-06 14:06:50,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:06:53,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 182.39146 ± 70.630
2025-05-06 14:06:53,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [233.19913, 71.30379, 120.63449, 189.94762, 124.90068, 274.10187, 101.8848, 246.34615, 185.41386, 276.18222]
2025-05-06 14:06:53,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [142.0, 72.0, 87.0, 143.0, 87.0, 148.0, 78.0, 133.0, 114.0, 150.0]
2025-05-06 14:06:53,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (182.39) for latency SparseU15
2025-05-06 14:06:53,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 14:06:53,692 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 14:06:53,700 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 49 minutes, 17 seconds)
2025-05-06 14:09:35,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:09:37,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 153.30022 ± 71.746
2025-05-06 14:09:37,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [283.58176, 124.22975, 182.71933, 112.89316, 138.77864, 118.84073, 132.64497, 155.39716, 21.394001, 262.5227]
2025-05-06 14:09:37,679 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [158.0, 84.0, 116.0, 90.0, 98.0, 78.0, 91.0, 96.0, 24.0, 132.0]
2025-05-06 14:09:37,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 46 minutes, 56 seconds)
2025-05-06 14:12:19,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:12:21,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 200.35495 ± 75.162
2025-05-06 14:12:21,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [136.21687, 23.313347, 205.93927, 226.00565, 264.09058, 258.79904, 292.97403, 244.59462, 152.06412, 199.55211]
2025-05-06 14:12:21,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [98.0, 30.0, 128.0, 139.0, 142.0, 133.0, 155.0, 126.0, 96.0, 122.0]
2025-05-06 14:12:21,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (200.35) for latency SparseU15
2025-05-06 14:12:21,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 14:12:21,800 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 14:12:21,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 44 minutes, 14 seconds)
2025-05-06 14:15:04,429 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:15:06,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 131.32875 ± 81.110
2025-05-06 14:15:06,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [198.47368, 199.46034, 37.881786, 116.929565, 30.67828, 108.621315, 196.82295, 279.01196, 117.85922, 27.548399]
2025-05-06 14:15:06,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [119.0, 120.0, 40.0, 94.0, 32.0, 79.0, 115.0, 140.0, 86.0, 32.0]
2025-05-06 14:15:06,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 41 minutes, 46 seconds)
2025-05-06 14:17:49,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:17:51,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 143.85583 ± 48.078
2025-05-06 14:17:51,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [139.66228, 171.22446, 232.8294, 149.10918, 219.79391, 107.77299, 85.851845, 108.569244, 93.42236, 130.3228]
2025-05-06 14:17:51,722 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [86.0, 98.0, 114.0, 95.0, 126.0, 76.0, 62.0, 72.0, 71.0, 90.0]
2025-05-06 14:17:51,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 39 minutes, 32 seconds)
2025-05-06 14:20:34,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:20:36,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 145.84042 ± 63.063
2025-05-06 14:20:36,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [111.33312, 224.34515, 121.724, 84.65652, 61.95472, 110.31422, 99.63673, 209.59222, 176.53476, 258.31277]
2025-05-06 14:20:36,327 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [75.0, 129.0, 87.0, 70.0, 59.0, 76.0, 84.0, 119.0, 118.0, 135.0]
2025-05-06 14:20:36,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 36 minutes, 37 seconds)
2025-05-06 14:23:19,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:23:21,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 163.52905 ± 74.021
2025-05-06 14:23:21,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [193.76248, 26.624866, 187.80637, 244.49393, 93.676735, 116.723724, 188.46783, 87.162796, 241.29843, 255.27335]
2025-05-06 14:23:21,760 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [107.0, 32.0, 107.0, 127.0, 67.0, 85.0, 116.0, 65.0, 124.0, 129.0]
2025-05-06 14:23:21,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 34 minutes, 15 seconds)
2025-05-06 14:26:04,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:26:06,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 186.08369 ± 42.706
2025-05-06 14:26:06,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [145.88739, 195.61502, 194.5598, 145.47548, 147.4624, 230.31494, 269.83713, 219.4741, 129.31197, 182.89862]
2025-05-06 14:26:06,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [94.0, 108.0, 113.0, 93.0, 98.0, 121.0, 145.0, 108.0, 97.0, 112.0]
2025-05-06 14:26:06,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 31 minutes, 44 seconds)
2025-05-06 14:28:50,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:28:52,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 174.18498 ± 76.597
2025-05-06 14:28:52,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [21.274796, 287.02274, 108.9347, 205.5702, 202.09735, 195.02615, 116.264694, 275.93515, 198.17624, 131.54773]
2025-05-06 14:28:52,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [22.0, 153.0, 78.0, 107.0, 120.0, 108.0, 79.0, 135.0, 111.0, 82.0]
2025-05-06 14:28:52,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 29 minutes, 16 seconds)
2025-05-06 14:31:34,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:31:36,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 156.39336 ± 84.358
2025-05-06 14:31:36,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [171.58911, 28.85784, 26.811407, 204.65782, 89.070694, 257.7537, 94.59337, 216.39946, 249.18883, 225.01135]
2025-05-06 14:31:36,431 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [111.0, 31.0, 29.0, 106.0, 65.0, 128.0, 71.0, 120.0, 128.0, 137.0]
2025-05-06 14:31:36,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 26 minutes, 10 seconds)
2025-05-06 14:34:20,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:34:22,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 179.79440 ± 46.682
2025-05-06 14:34:22,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [154.6266, 221.17897, 198.77774, 269.4022, 94.949234, 189.45457, 182.55272, 193.83742, 120.57412, 172.59041]
2025-05-06 14:34:22,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [102.0, 123.0, 113.0, 150.0, 74.0, 106.0, 104.0, 124.0, 91.0, 103.0]
2025-05-06 14:34:22,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 23 minutes, 49 seconds)
2025-05-06 14:37:04,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:37:07,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 185.11511 ± 60.824
2025-05-06 14:37:07,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [182.0225, 73.37963, 231.67326, 176.56433, 168.27435, 248.23563, 258.9261, 103.03716, 254.79086, 154.24725]
2025-05-06 14:37:07,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 63.0, 125.0, 97.0, 103.0, 131.0, 144.0, 83.0, 137.0, 105.0]
2025-05-06 14:37:07,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 20 minutes, 53 seconds)
2025-05-06 14:39:50,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:39:53,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 204.31200 ± 67.349
2025-05-06 14:39:53,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [289.82642, 182.41692, 110.9183, 290.98444, 233.80875, 125.68519, 261.12836, 259.79053, 124.69646, 163.86462]
2025-05-06 14:39:53,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [138.0, 101.0, 75.0, 149.0, 136.0, 96.0, 127.0, 133.0, 91.0, 111.0]
2025-05-06 14:39:53,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (204.31) for latency SparseU15
2025-05-06 14:39:53,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 14:39:53,672 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 14:39:53,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 18 minutes, 27 seconds)
2025-05-06 14:42:36,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:42:38,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 177.66536 ± 53.380
2025-05-06 14:42:38,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [170.62152, 244.87692, 162.96378, 215.90004, 202.20181, 139.04465, 244.414, 96.486595, 211.6661, 88.47809]
2025-05-06 14:42:38,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [112.0, 131.0, 98.0, 132.0, 122.0, 89.0, 144.0, 70.0, 127.0, 62.0]
2025-05-06 14:42:38,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 15 minutes, 31 seconds)
2025-05-06 14:45:21,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:45:24,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 213.08813 ± 45.294
2025-05-06 14:45:24,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [224.09962, 259.6547, 273.29028, 252.89644, 121.94124, 187.05922, 168.03773, 184.34229, 212.19395, 247.36581]
2025-05-06 14:45:24,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [118.0, 139.0, 143.0, 118.0, 80.0, 117.0, 97.0, 106.0, 113.0, 128.0]
2025-05-06 14:45:24,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (213.09) for latency SparseU15
2025-05-06 14:45:24,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 14:45:24,129 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 14:45:24,138 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 13 minutes, 7 seconds)
2025-05-06 14:48:06,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:48:09,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 201.57458 ± 48.450
2025-05-06 14:48:09,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [146.72449, 206.58704, 218.80171, 211.22606, 259.241, 248.41942, 149.5676, 238.40279, 232.681, 104.09472]
2025-05-06 14:48:09,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 109.0, 108.0, 117.0, 136.0, 124.0, 101.0, 128.0, 116.0, 70.0]
2025-05-06 14:48:09,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 10 minutes, 7 seconds)
2025-05-06 14:50:52,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:50:54,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 148.74228 ± 67.715
2025-05-06 14:50:54,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [257.83188, 221.59575, 154.31097, 223.78972, 64.64365, 27.296711, 144.63205, 127.292206, 142.7401, 123.28963]
2025-05-06 14:50:54,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [136.0, 115.0, 98.0, 123.0, 58.0, 29.0, 95.0, 85.0, 96.0, 81.0]
2025-05-06 14:50:54,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 7 minutes, 25 seconds)
2025-05-06 14:53:37,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:53:39,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 138.63406 ± 81.785
2025-05-06 14:53:39,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [171.74419, 45.69496, 85.109634, 205.76004, 176.58963, 169.5768, 26.180035, 24.676895, 262.84802, 218.16042]
2025-05-06 14:53:39,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [103.0, 42.0, 65.0, 118.0, 102.0, 108.0, 29.0, 31.0, 145.0, 124.0]
2025-05-06 14:53:39,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 4 minutes, 20 seconds)
2025-05-06 14:56:22,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:56:24,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 205.27644 ± 71.838
2025-05-06 14:56:24,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [242.52588, 234.99805, 116.63137, 24.164991, 226.97496, 247.92728, 239.71442, 262.72855, 208.33243, 248.76643]
2025-05-06 14:56:24,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [127.0, 121.0, 89.0, 25.0, 117.0, 120.0, 131.0, 130.0, 122.0, 133.0]
2025-05-06 14:56:24,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 1 minute, 42 seconds)
2025-05-06 14:59:08,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 14:59:11,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 212.61577 ± 74.774
2025-05-06 14:59:11,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [272.24448, 20.57591, 213.92067, 128.48697, 248.07195, 245.04175, 236.27603, 246.74918, 244.09413, 270.69662]
2025-05-06 14:59:11,238 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [146.0, 26.0, 114.0, 88.0, 139.0, 135.0, 121.0, 132.0, 131.0, 142.0]
2025-05-06 14:59:11,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 59 minutes, 12 seconds)
2025-05-06 15:01:53,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:01:56,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 182.13722 ± 75.001
2025-05-06 15:01:56,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [164.24362, 160.05656, 175.63611, 241.44452, 128.99057, 25.550547, 276.85547, 247.12718, 127.74757, 273.72012]
2025-05-06 15:01:56,248 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 103.0, 108.0, 129.0, 99.0, 27.0, 154.0, 126.0, 99.0, 144.0]
2025-05-06 15:01:56,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 56 minutes, 25 seconds)
2025-05-06 15:04:39,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:04:41,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 203.16837 ± 92.350
2025-05-06 15:04:41,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [271.9278, 24.890043, 263.7675, 212.93626, 304.92917, 195.26495, 240.0845, 228.79158, 257.8975, 31.19449]
2025-05-06 15:04:41,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [147.0, 27.0, 141.0, 113.0, 157.0, 116.0, 129.0, 118.0, 140.0, 32.0]
2025-05-06 15:04:41,798 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 53 minutes, 46 seconds)
2025-05-06 15:07:26,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:07:29,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 248.37651 ± 18.695
2025-05-06 15:07:29,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [198.09877, 248.49208, 251.0671, 253.81738, 240.37859, 253.30603, 265.88913, 258.05634, 269.19153, 245.46828]
2025-05-06 15:07:29,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [122.0, 129.0, 137.0, 134.0, 121.0, 133.0, 147.0, 144.0, 152.0, 131.0]
2025-05-06 15:07:29,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (248.38) for latency SparseU15
2025-05-06 15:07:29,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:07:29,369 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 15:07:29,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 51 minutes, 35 seconds)
2025-05-06 15:10:12,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:10:15,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 218.92207 ± 64.803
2025-05-06 15:10:15,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [292.09344, 285.32187, 113.067444, 172.24348, 224.40295, 278.92535, 103.32555, 255.56056, 223.11252, 241.16754]
2025-05-06 15:10:15,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [144.0, 141.0, 74.0, 97.0, 119.0, 132.0, 77.0, 138.0, 123.0, 135.0]
2025-05-06 15:10:15,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 48 minutes, 56 seconds)
2025-05-06 15:12:56,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:12:59,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 200.03912 ± 70.204
2025-05-06 15:12:59,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [262.64682, 233.54924, 166.66245, 125.02102, 289.69614, 242.77313, 292.78543, 124.97909, 175.33923, 86.93871]
2025-05-06 15:12:59,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [138.0, 133.0, 99.0, 84.0, 156.0, 134.0, 152.0, 87.0, 104.0, 75.0]
2025-05-06 15:12:59,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 45 minutes, 41 seconds)
2025-05-06 15:15:42,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:15:44,486 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 167.65984 ± 70.780
2025-05-06 15:15:44,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [135.77931, 135.1685, 250.9015, 147.96674, 111.4385, 25.827242, 204.24667, 263.22473, 249.67578, 152.36935]
2025-05-06 15:15:44,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 97.0, 127.0, 93.0, 79.0, 32.0, 117.0, 141.0, 127.0, 91.0]
2025-05-06 15:15:44,492 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 42 minutes, 53 seconds)
2025-05-06 15:18:28,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:18:31,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 195.64255 ± 75.142
2025-05-06 15:18:31,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [30.038403, 263.674, 283.98807, 177.143, 130.58675, 227.74413, 288.01062, 221.9095, 159.61336, 173.71764]
2025-05-06 15:18:31,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [31.0, 136.0, 145.0, 104.0, 88.0, 123.0, 142.0, 116.0, 107.0, 114.0]
2025-05-06 15:18:31,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 40 minutes, 19 seconds)
2025-05-06 15:21:13,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:21:15,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 170.22263 ± 84.422
2025-05-06 15:21:15,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [214.25038, 203.59949, 242.1237, 100.88238, 85.21419, 187.05087, 253.78214, 88.64463, 299.35312, 27.325373]
2025-05-06 15:21:15,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [107.0, 112.0, 124.0, 72.0, 62.0, 111.0, 133.0, 70.0, 143.0, 31.0]
2025-05-06 15:21:15,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 36 minutes, 59 seconds)
2025-05-06 15:23:57,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:24:00,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 236.78731 ± 41.636
2025-05-06 15:24:00,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [291.10782, 277.21732, 210.0774, 147.1158, 227.80365, 223.96768, 287.1498, 208.09796, 238.87764, 256.4581]
2025-05-06 15:24:00,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [128.0, 138.0, 119.0, 95.0, 121.0, 121.0, 151.0, 118.0, 123.0, 135.0]
2025-05-06 15:24:00,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 33 minutes, 59 seconds)
2025-05-06 15:26:43,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:26:46,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 146.91774 ± 84.753
2025-05-06 15:26:46,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [277.53412, 113.92055, 167.82896, 118.05055, 102.62921, 324.4131, 123.88493, 126.73901, 26.529324, 87.64759]
2025-05-06 15:26:46,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [149.0, 82.0, 96.0, 82.0, 75.0, 149.0, 94.0, 86.0, 32.0, 69.0]
2025-05-06 15:26:46,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 31 minutes, 29 seconds)
2025-05-06 15:29:29,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:29:32,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 237.34277 ± 54.452
2025-05-06 15:29:32,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [231.57407, 271.9953, 253.69707, 326.89612, 292.13916, 190.09686, 161.78925, 143.98714, 258.3055, 242.94719]
2025-05-06 15:29:32,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [121.0, 135.0, 132.0, 143.0, 145.0, 113.0, 96.0, 93.0, 132.0, 128.0]
2025-05-06 15:29:32,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 28 minutes, 58 seconds)
2025-05-06 15:32:15,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:32:18,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 223.65935 ± 55.725
2025-05-06 15:32:18,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [299.291, 169.2066, 241.41624, 299.31262, 153.79123, 207.06105, 288.80945, 210.74915, 225.5012, 141.45505]
2025-05-06 15:32:18,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [148.0, 109.0, 125.0, 149.0, 105.0, 116.0, 141.0, 116.0, 121.0, 93.0]
2025-05-06 15:32:18,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 26 minutes, 11 seconds)
2025-05-06 15:35:02,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:35:05,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 181.19548 ± 101.285
2025-05-06 15:35:05,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [157.99275, 336.20822, 163.77397, 174.87457, 234.813, 283.72806, 289.45862, 121.92581, 25.914185, 23.265488]
2025-05-06 15:35:05,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [105.0, 163.0, 100.0, 112.0, 132.0, 141.0, 137.0, 86.0, 27.0, 25.0]
2025-05-06 15:35:05,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 23 minutes, 48 seconds)
2025-05-06 15:37:48,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:37:51,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 235.62363 ± 73.058
2025-05-06 15:37:51,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [247.4681, 315.84518, 341.60388, 204.97379, 91.673225, 196.44856, 328.17593, 240.4837, 209.71126, 179.85269]
2025-05-06 15:37:51,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [129.0, 156.0, 150.0, 114.0, 60.0, 109.0, 145.0, 121.0, 111.0, 103.0]
2025-05-06 15:37:51,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 21 minutes, 16 seconds)
2025-05-06 15:40:33,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:40:37,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 246.82854 ± 62.920
2025-05-06 15:40:37,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [266.4895, 268.1599, 241.53024, 284.64938, 144.79822, 115.34783, 255.793, 286.5577, 272.9922, 331.96725]
2025-05-06 15:40:37,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 142.0, 122.0, 134.0, 93.0, 81.0, 136.0, 153.0, 155.0, 150.0]
2025-05-06 15:40:37,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 18 minutes, 29 seconds)
2025-05-06 15:43:22,314 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:43:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 271.38397 ± 55.015
2025-05-06 15:43:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [257.37515, 262.3537, 128.06773, 274.88406, 271.0371, 294.54306, 271.98303, 277.9764, 333.8488, 341.77087]
2025-05-06 15:43:25,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [130.0, 133.0, 88.0, 144.0, 136.0, 140.0, 140.0, 133.0, 146.0, 140.0]
2025-05-06 15:43:25,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (271.38) for latency SparseU15
2025-05-06 15:43:25,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:43:25,459 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 15:43:25,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 16 minutes, 6 seconds)
2025-05-06 15:46:08,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:46:10,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 217.65015 ± 104.394
2025-05-06 15:46:10,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [196.38023, 286.13812, 249.63344, 25.215439, 272.98624, 298.5207, 255.10416, 352.65268, 26.50636, 213.36406]
2025-05-06 15:46:10,831 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [109.0, 137.0, 130.0, 29.0, 137.0, 140.0, 131.0, 146.0, 32.0, 115.0]
2025-05-06 15:46:10,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 13 minutes, 10 seconds)
2025-05-06 15:48:54,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:48:56,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 216.96773 ± 78.902
2025-05-06 15:48:56,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [204.1408, 311.03473, 22.233345, 241.23392, 256.50778, 179.71587, 323.42258, 199.51385, 207.5319, 224.3424]
2025-05-06 15:48:56,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [114.0, 140.0, 24.0, 125.0, 132.0, 113.0, 157.0, 106.0, 106.0, 118.0]
2025-05-06 15:48:56,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 10 minutes, 17 seconds)
2025-05-06 15:51:40,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:51:43,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 227.79549 ± 70.901
2025-05-06 15:51:43,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [144.63274, 263.42303, 310.72375, 291.0644, 262.32504, 277.9679, 123.18192, 262.8082, 237.44789, 104.37989]
2025-05-06 15:51:43,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [95.0, 134.0, 145.0, 149.0, 135.0, 136.0, 82.0, 126.0, 122.0, 79.0]
2025-05-06 15:51:43,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 7 minutes, 30 seconds)
2025-05-06 15:54:25,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:54:28,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 235.06978 ± 97.562
2025-05-06 15:54:28,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [161.37549, 258.4247, 28.942286, 245.95598, 205.67949, 348.70096, 234.43987, 211.88103, 414.66895, 240.62898]
2025-05-06 15:54:28,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [102.0, 126.0, 33.0, 123.0, 114.0, 153.0, 119.0, 108.0, 174.0, 121.0]
2025-05-06 15:54:28,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 4 minutes, 44 seconds)
2025-05-06 15:57:12,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 15:57:15,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 252.71078 ± 89.107
2025-05-06 15:57:15,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [27.68444, 245.90334, 226.49672, 298.8175, 199.3444, 281.36185, 286.74765, 376.3854, 328.68585, 255.6806]
2025-05-06 15:57:15,119 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [31.0, 128.0, 135.0, 139.0, 113.0, 153.0, 133.0, 161.0, 151.0, 127.0]
2025-05-06 15:57:15,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 1 minute, 40 seconds)
2025-05-06 15:59:59,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:00:02,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 153.99130 ± 96.357
2025-05-06 16:00:02,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [104.194916, 346.1591, 111.53076, 118.21126, 23.833448, 28.827879, 187.87808, 201.88373, 273.19867, 144.19514]
2025-05-06 16:00:02,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [72.0, 154.0, 77.0, 82.0, 31.0, 32.0, 110.0, 108.0, 134.0, 89.0]
2025-05-06 16:00:02,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 59 minutes, 8 seconds)
2025-05-06 16:02:45,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:02:48,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 160.15085 ± 71.328
2025-05-06 16:02:48,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [263.6464, 242.96532, 102.57506, 247.50032, 34.39343, 104.27878, 137.08237, 149.38187, 122.57925, 197.10583]
2025-05-06 16:02:48,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [129.0, 128.0, 71.0, 125.0, 37.0, 70.0, 87.0, 97.0, 84.0, 106.0]
2025-05-06 16:02:48,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 56 minutes, 22 seconds)
2025-05-06 16:05:29,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:05:32,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 227.57483 ± 100.424
2025-05-06 16:05:32,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [319.53204, 28.900244, 259.3701, 241.573, 329.75034, 111.56155, 281.16623, 337.6254, 249.78377, 116.485664]
2025-05-06 16:05:32,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [150.0, 33.0, 131.0, 131.0, 148.0, 85.0, 138.0, 148.0, 127.0, 83.0]
2025-05-06 16:05:32,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 53 minutes, 20 seconds)
2025-05-06 16:08:16,515 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:08:19,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 262.65146 ± 63.063
2025-05-06 16:08:19,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [358.12155, 336.94476, 337.73538, 213.39915, 263.90436, 233.27223, 242.08614, 261.45715, 138.96936, 240.6246]
2025-05-06 16:08:19,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [162.0, 147.0, 147.0, 111.0, 132.0, 114.0, 124.0, 135.0, 97.0, 124.0]
2025-05-06 16:08:19,572 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 50 minutes, 47 seconds)
2025-05-06 16:11:01,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:11:04,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 198.55807 ± 77.111
2025-05-06 16:11:04,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [162.36171, 209.31389, 228.43266, 269.4237, 151.39635, 27.015202, 311.12793, 279.19003, 178.95323, 168.36592]
2025-05-06 16:11:04,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [97.0, 117.0, 121.0, 137.0, 95.0, 30.0, 149.0, 130.0, 95.0, 104.0]
2025-05-06 16:11:04,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 47 minutes, 47 seconds)
2025-05-06 16:13:47,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:13:49,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 159.96048 ± 93.254
2025-05-06 16:13:49,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [261.80344, 153.24402, 30.55401, 125.21129, 252.15475, 115.916725, 27.366121, 272.74994, 272.56107, 88.04342]
2025-05-06 16:13:49,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [129.0, 90.0, 32.0, 86.0, 132.0, 81.0, 32.0, 131.0, 137.0, 70.0]
2025-05-06 16:13:49,897 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 44 minutes, 51 seconds)
2025-05-06 16:16:32,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:16:34,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 240.56641 ± 76.018
2025-05-06 16:16:34,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [221.2806, 248.49416, 338.64383, 332.74942, 332.62064, 144.16318, 186.95708, 131.0604, 177.67557, 292.01938]
2025-05-06 16:16:34,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [113.0, 133.0, 136.0, 139.0, 137.0, 95.0, 100.0, 91.0, 106.0, 147.0]
2025-05-06 16:16:34,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 41 minutes, 57 seconds)
2025-05-06 16:19:18,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:19:20,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 226.95560 ± 102.135
2025-05-06 16:19:20,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [269.9398, 102.76355, 310.4089, 314.15808, 284.97272, 177.9722, 316.57904, 26.854729, 329.30212, 136.6048]
2025-05-06 16:19:20,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [122.0, 72.0, 138.0, 146.0, 141.0, 100.0, 150.0, 29.0, 145.0, 90.0]
2025-05-06 16:19:20,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 39 minutes, 24 seconds)
2025-05-06 16:22:01,854 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:22:04,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 263.58203 ± 58.029
2025-05-06 16:22:04,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [326.7288, 330.33206, 202.65482, 182.15071, 182.95105, 268.87323, 293.56122, 217.88493, 317.9034, 312.78015]
2025-05-06 16:22:04,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 155.0, 114.0, 107.0, 107.0, 129.0, 140.0, 112.0, 142.0, 147.0]
2025-05-06 16:22:04,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 36 minutes, 16 seconds)
2025-05-06 16:24:50,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:24:52,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 226.13962 ± 84.451
2025-05-06 16:24:52,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [292.82315, 172.15964, 186.3235, 279.9323, 310.26562, 330.02625, 262.84076, 195.27138, 29.287802, 202.46591]
2025-05-06 16:24:52,921 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [128.0, 98.0, 104.0, 142.0, 138.0, 147.0, 136.0, 111.0, 32.0, 115.0]
2025-05-06 16:24:52,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 33 minutes, 54 seconds)
2025-05-06 16:27:38,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:27:40,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 173.81929 ± 107.036
2025-05-06 16:27:40,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [371.75098, 88.906105, 219.7014, 225.28773, 25.132072, 264.69492, 105.90179, 147.92368, 30.316702, 258.57745]
2025-05-06 16:27:40,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [157.0, 67.0, 110.0, 122.0, 27.0, 133.0, 77.0, 104.0, 31.0, 134.0]
2025-05-06 16:27:40,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 31 minutes, 22 seconds)
2025-05-06 16:30:22,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:30:25,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 204.84421 ± 96.054
2025-05-06 16:30:25,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [249.65213, 250.91643, 195.8298, 30.125372, 315.2398, 179.00793, 245.01935, 30.140848, 310.43494, 242.07553]
2025-05-06 16:30:25,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [127.0, 126.0, 111.0, 32.0, 137.0, 99.0, 127.0, 32.0, 130.0, 122.0]
2025-05-06 16:30:25,025 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 28 minutes, 33 seconds)
2025-05-06 16:33:06,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:33:10,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 295.07925 ± 81.784
2025-05-06 16:33:10,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [366.45407, 316.0759, 231.33194, 204.79913, 181.08034, 365.1913, 195.74562, 334.30295, 322.68204, 433.12903]
2025-05-06 16:33:10,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [166.0, 148.0, 116.0, 111.0, 110.0, 155.0, 116.0, 144.0, 157.0, 165.0]
2025-05-06 16:33:10,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (295.08) for latency SparseU15
2025-05-06 16:33:10,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 16:33:10,077 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 16:33:10,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 25 minutes, 41 seconds)
2025-05-06 16:35:52,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:35:54,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 184.46909 ± 123.545
2025-05-06 16:35:54,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [213.98433, 26.014385, 317.84753, 260.44485, 174.76843, 306.94714, 125.79673, 28.145119, 22.338413, 368.40408]
2025-05-06 16:35:54,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [122.0, 29.0, 143.0, 131.0, 104.0, 141.0, 81.0, 30.0, 27.0, 180.0]
2025-05-06 16:35:54,784 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 22 minutes, 59 seconds)
2025-05-06 16:38:37,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:38:39,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 208.40417 ± 127.677
2025-05-06 16:38:39,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [376.40744, 201.65341, 97.50372, 291.34286, 412.48053, 325.51385, 95.50265, 116.776184, 20.882978, 145.97823]
2025-05-06 16:38:39,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [163.0, 115.0, 65.0, 127.0, 172.0, 150.0, 75.0, 86.0, 24.0, 103.0]
2025-05-06 16:38:39,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 19 minutes, 56 seconds)
2025-05-06 16:41:21,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:41:24,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 267.10745 ± 100.974
2025-05-06 16:41:24,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [25.578426, 342.42575, 330.12042, 229.50978, 256.05017, 333.59485, 164.83672, 361.24185, 267.32977, 360.38678]
2025-05-06 16:41:24,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [30.0, 144.0, 148.0, 138.0, 131.0, 147.0, 106.0, 147.0, 127.0, 160.0]
2025-05-06 16:41:24,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 16 minutes, 53 seconds)
2025-05-06 16:44:07,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:44:10,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 225.68462 ± 95.218
2025-05-06 16:44:10,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [372.02194, 143.55478, 291.50348, 274.61517, 322.65756, 30.350023, 205.75928, 142.8063, 255.58423, 217.99333]
2025-05-06 16:44:10,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [153.0, 105.0, 131.0, 134.0, 142.0, 32.0, 117.0, 93.0, 130.0, 108.0]
2025-05-06 16:44:10,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 14 minutes, 16 seconds)
2025-05-06 16:46:53,577 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:46:56,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 264.31476 ± 70.976
2025-05-06 16:46:56,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [342.93887, 287.4758, 393.69373, 327.40823, 269.2005, 211.00063, 223.76343, 249.40459, 162.63072, 175.63141]
2025-05-06 16:46:56,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [170.0, 133.0, 167.0, 140.0, 127.0, 118.0, 122.0, 129.0, 105.0, 105.0]
2025-05-06 16:46:56,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 11 minutes, 37 seconds)
2025-05-06 16:49:39,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:49:42,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 189.19577 ± 101.166
2025-05-06 16:49:42,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [314.35385, 140.96948, 163.9882, 303.6513, 18.63291, 148.50166, 247.72112, 278.63455, 245.35063, 30.154104]
2025-05-06 16:49:42,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 92.0, 96.0, 144.0, 21.0, 99.0, 130.0, 127.0, 143.0, 32.0]
2025-05-06 16:49:42,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 8 minutes, 56 seconds)
2025-05-06 16:52:24,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:52:27,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 274.23755 ± 102.086
2025-05-06 16:52:27,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [444.0329, 372.37714, 398.4726, 250.71152, 170.81259, 333.49158, 204.69243, 259.96777, 181.8259, 125.99096]
2025-05-06 16:52:27,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [184.0, 156.0, 168.0, 127.0, 99.0, 161.0, 116.0, 138.0, 107.0, 83.0]
2025-05-06 16:52:27,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 6 minutes, 11 seconds)
2025-05-06 16:55:11,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:55:14,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 256.70312 ± 116.321
2025-05-06 16:55:14,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [264.35828, 141.13698, 200.69818, 393.70673, 171.33513, 409.0751, 351.13382, 103.970856, 401.95334, 129.66277]
2025-05-06 16:55:14,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [131.0, 93.0, 113.0, 173.0, 104.0, 186.0, 165.0, 73.0, 158.0, 91.0]
2025-05-06 16:55:14,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 3 minutes, 36 seconds)
2025-05-06 16:57:56,589 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 16:57:59,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 276.83575 ± 78.748
2025-05-06 16:57:59,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [343.79916, 373.87823, 179.93263, 370.1725, 129.8152, 268.34772, 310.93726, 317.89328, 270.4484, 203.13307]
2025-05-06 16:57:59,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [145.0, 159.0, 105.0, 156.0, 87.0, 133.0, 143.0, 140.0, 142.0, 109.0]
2025-05-06 16:57:59,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 48 seconds)
2025-05-06 17:00:41,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:00:44,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 225.29868 ± 124.223
2025-05-06 17:00:44,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [117.573494, 30.250395, 370.4154, 239.27989, 340.95676, 201.3681, 314.5933, 241.50368, 371.42615, 25.619661]
2025-05-06 17:00:44,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [82.0, 32.0, 149.0, 122.0, 150.0, 110.0, 138.0, 131.0, 164.0, 28.0]
2025-05-06 17:00:44,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 80/100 (estimated time remaining: 57 minutes, 56 seconds)
2025-05-06 17:03:28,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:03:31,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 268.44550 ± 126.497
2025-05-06 17:03:31,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [358.78226, 405.09634, 189.12675, 406.17975, 86.6458, 316.56033, 316.02756, 365.22113, 27.959759, 212.85513]
2025-05-06 17:03:31,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [143.0, 159.0, 110.0, 168.0, 59.0, 144.0, 141.0, 154.0, 31.0, 112.0]
2025-05-06 17:03:31,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 81/100 (estimated time remaining: 55 minutes, 18 seconds)
2025-05-06 17:06:13,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:06:16,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 254.25288 ± 105.920
2025-05-06 17:06:16,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [374.57535, 364.97253, 149.42471, 224.47592, 329.17435, 245.65727, 30.765661, 191.57982, 259.36508, 372.53845]
2025-05-06 17:06:16,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [163.0, 152.0, 88.0, 119.0, 146.0, 129.0, 33.0, 112.0, 131.0, 152.0]
2025-05-06 17:06:16,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 82/100 (estimated time remaining: 52 minutes, 30 seconds)
2025-05-06 17:08:59,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:09:02,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 270.51428 ± 71.846
2025-05-06 17:09:02,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [189.52338, 317.9589, 321.7049, 171.00056, 262.3416, 181.2589, 259.48178, 363.70566, 385.1404, 253.02695]
2025-05-06 17:09:02,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 145.0, 139.0, 109.0, 126.0, 102.0, 129.0, 155.0, 158.0, 125.0]
2025-05-06 17:09:02,041 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 83/100 (estimated time remaining: 49 minutes, 40 seconds)
2025-05-06 17:11:45,949 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:11:48,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 259.38535 ± 103.987
2025-05-06 17:11:48,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [299.93195, 321.6182, 105.874725, 288.93466, 27.545805, 344.98, 343.50345, 358.69748, 247.18242, 255.5846]
2025-05-06 17:11:48,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [135.0, 139.0, 72.0, 142.0, 32.0, 158.0, 143.0, 147.0, 135.0, 133.0]
2025-05-06 17:11:48,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 84/100 (estimated time remaining: 46 minutes, 59 seconds)
2025-05-06 17:14:31,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:14:34,079 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 227.54851 ± 110.171
2025-05-06 17:14:34,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [371.4685, 94.02776, 262.26926, 161.92564, 232.78224, 375.4952, 147.36429, 108.68835, 388.4973, 132.96645]
2025-05-06 17:14:34,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [163.0, 71.0, 131.0, 98.0, 127.0, 167.0, 97.0, 72.0, 157.0, 87.0]
2025-05-06 17:14:34,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 85/100 (estimated time remaining: 44 minutes, 15 seconds)
2025-05-06 17:17:18,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:17:20,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 211.83960 ± 133.360
2025-05-06 17:17:20,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [192.45924, 231.95638, 336.0664, 351.46942, 98.20363, 25.531988, 318.36868, 414.10144, 25.762403, 124.47635]
2025-05-06 17:17:20,642 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [111.0, 119.0, 144.0, 169.0, 75.0, 31.0, 145.0, 169.0, 30.0, 86.0]
2025-05-06 17:17:20,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 86/100 (estimated time remaining: 41 minutes, 27 seconds)
2025-05-06 17:20:03,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:20:06,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 213.43416 ± 113.909
2025-05-06 17:20:06,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [354.58276, 103.41083, 166.93134, 351.40753, 27.15091, 106.42303, 179.33904, 362.6077, 189.22804, 293.2606]
2025-05-06 17:20:06,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [157.0, 73.0, 96.0, 149.0, 32.0, 83.0, 98.0, 156.0, 115.0, 133.0]
2025-05-06 17:20:06,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 87/100 (estimated time remaining: 38 minutes, 43 seconds)
2025-05-06 17:22:51,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:22:54,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 206.60527 ± 106.393
2025-05-06 17:22:54,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [155.7274, 356.84534, 146.39728, 193.05666, 99.93469, 165.81953, 390.43555, 312.50333, 200.39746, 44.93547]
2025-05-06 17:22:54,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [99.0, 151.0, 92.0, 111.0, 71.0, 106.0, 167.0, 146.0, 112.0, 44.0]
2025-05-06 17:22:54,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 88/100 (estimated time remaining: 36 minutes, 4 seconds)
2025-05-06 17:25:38,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:25:41,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 200.61099 ± 122.115
2025-05-06 17:25:41,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [423.91003, 198.35715, 280.52338, 181.06248, 340.13632, 179.02162, 26.638025, 112.181816, 244.14265, 20.136518]
2025-05-06 17:25:41,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [179.0, 113.0, 136.0, 103.0, 151.0, 112.0, 29.0, 77.0, 138.0, 25.0]
2025-05-06 17:25:41,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 89/100 (estimated time remaining: 33 minutes, 17 seconds)
2025-05-06 17:28:24,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:28:27,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 279.16644 ± 92.276
2025-05-06 17:28:27,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [297.35458, 325.78534, 124.520035, 315.36447, 318.71573, 274.72812, 272.0104, 410.3972, 354.83304, 97.95541]
2025-05-06 17:28:27,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [132.0, 144.0, 87.0, 141.0, 138.0, 135.0, 127.0, 160.0, 156.0, 71.0]
2025-05-06 17:28:27,468 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 90/100 (estimated time remaining: 30 minutes, 33 seconds)
2025-05-06 17:31:11,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:31:13,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 198.55322 ± 152.385
2025-05-06 17:31:13,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [190.66515, 27.948446, 186.19092, 28.74333, 387.65552, 221.62715, 116.99434, 488.72205, 18.314753, 318.6707]
2025-05-06 17:31:13,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [113.0, 30.0, 107.0, 32.0, 162.0, 121.0, 82.0, 193.0, 21.0, 142.0]
2025-05-06 17:31:13,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 91/100 (estimated time remaining: 27 minutes, 46 seconds)
2025-05-06 17:33:58,472 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:34:01,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 266.23251 ± 109.065
2025-05-06 17:34:01,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [358.5605, 278.71375, 26.503803, 164.36703, 324.15607, 332.8005, 346.3187, 345.61444, 140.6382, 344.65225]
2025-05-06 17:34:01,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [160.0, 138.0, 32.0, 102.0, 146.0, 145.0, 152.0, 143.0, 97.0, 144.0]
2025-05-06 17:34:01,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 92/100 (estimated time remaining: 25 minutes, 3 seconds)
2025-05-06 17:36:46,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:36:48,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 225.83084 ± 140.296
2025-05-06 17:36:48,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [381.94766, 29.083569, 323.6383, 142.63753, 439.78992, 313.30966, 134.78633, 323.5932, 140.3543, 29.16806]
2025-05-06 17:36:48,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [154.0, 30.0, 137.0, 91.0, 171.0, 143.0, 91.0, 146.0, 95.0, 29.0]
2025-05-06 17:36:48,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 93/100 (estimated time remaining: 22 minutes, 14 seconds)
2025-05-06 17:39:31,522 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:39:34,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 308.37808 ± 81.247
2025-05-06 17:39:34,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [129.99083, 334.95972, 366.11557, 387.66577, 331.66507, 380.09268, 247.97076, 372.68002, 324.86517, 207.77524]
2025-05-06 17:39:34,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [91.0, 144.0, 161.0, 154.0, 145.0, 158.0, 129.0, 159.0, 147.0, 105.0]
2025-05-06 17:39:34,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (308.38) for latency SparseU15
2025-05-06 17:39:34,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 17:39:34,778 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 17:39:34,795 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 94/100 (estimated time remaining: 19 minutes, 27 seconds)
2025-05-06 17:42:19,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:42:22,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 263.20877 ± 125.404
2025-05-06 17:42:22,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [411.28952, 135.77458, 330.3898, 274.83786, 416.21353, 27.811392, 246.92819, 387.1238, 279.0519, 122.667046]
2025-05-06 17:42:22,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [193.0, 87.0, 152.0, 133.0, 166.0, 29.0, 162.0, 186.0, 148.0, 90.0]
2025-05-06 17:42:22,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 95/100 (estimated time remaining: 16 minutes, 42 seconds)
2025-05-06 17:45:06,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:45:09,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 315.47015 ± 51.491
2025-05-06 17:45:09,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [249.23306, 316.82962, 324.55762, 327.5975, 270.5068, 229.40666, 389.3045, 368.45584, 378.52887, 300.28104]
2025-05-06 17:45:09,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [122.0, 139.0, 144.0, 143.0, 123.0, 114.0, 155.0, 158.0, 160.0, 146.0]
2025-05-06 17:45:09,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1124 [INFO]: New best (315.47) for latency SparseU15
2025-05-06 17:45:09,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 17:45:09,512 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc3/noisy-hopper/SparseU15-sac-aug-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 17:45:09,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 55 seconds)
2025-05-06 17:47:52,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:47:55,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 261.72244 ± 137.249
2025-05-06 17:47:55,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [386.61917, 201.16302, 25.126825, 345.9265, 324.49103, 195.30806, 22.190836, 332.1142, 418.01816, 366.2665]
2025-05-06 17:47:55,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [160.0, 112.0, 30.0, 158.0, 142.0, 109.0, 24.0, 152.0, 174.0, 163.0]
2025-05-06 17:47:55,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 7 seconds)
2025-05-06 17:50:39,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:50:42,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 234.09848 ± 63.270
2025-05-06 17:50:42,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [252.54935, 145.37102, 342.76776, 160.4603, 294.21613, 144.57124, 223.81067, 274.89218, 233.79768, 268.54865]
2025-05-06 17:50:42,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [128.0, 104.0, 148.0, 102.0, 131.0, 100.0, 119.0, 133.0, 123.0, 143.0]
2025-05-06 17:50:42,398 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 20 seconds)
2025-05-06 17:53:27,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:53:30,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 260.86786 ± 75.961
2025-05-06 17:53:30,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [261.80405, 160.86086, 160.70525, 407.7048, 331.9993, 238.35684, 200.3392, 342.54593, 257.2668, 247.09544]
2025-05-06 17:53:30,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [128.0, 104.0, 109.0, 193.0, 154.0, 126.0, 115.0, 149.0, 127.0, 133.0]
2025-05-06 17:53:30,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 34 seconds)
2025-05-06 17:56:16,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:56:19,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 275.72211 ± 130.243
2025-05-06 17:56:19,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [23.059225, 353.80573, 253.5883, 114.88854, 399.54013, 414.5747, 313.24545, 302.628, 425.3585, 156.53238]
2025-05-06 17:56:19,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [29.0, 160.0, 130.0, 82.0, 178.0, 189.0, 173.0, 163.0, 208.0, 103.0]
2025-05-06 17:56:19,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 47 seconds)
2025-05-06 17:59:04,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 17:59:06,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1119 [DEBUG]: Total Reward: 239.13986 ± 87.128
2025-05-06 17:59:06,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1120 [DEBUG]: All rewards: [208.38095, 114.71285, 301.29077, 154.04716, 214.41895, 256.6979, 124.315575, 323.035, 303.30188, 391.19766]
2025-05-06 17:59:06,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [118.0, 80.0, 138.0, 86.0, 112.0, 127.0, 82.0, 142.0, 132.0, 163.0]
2025-05-06 17:59:06,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1149 [DEBUG]: Training session finished
