2026-01-25 17:02:44,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac
2026-01-25 17:02:44,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-sac
2026-01-25 17:02:44,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14e99d59db50>}
2026-01-25 17:02:44,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-25 17:02:44,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-25 17:02:44,195 baseline-sac-noisy-hopper:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=11, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-25 17:02:44,195 baseline-sac-noisy-hopper:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-25 17:02:44,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-25 17:02:44,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-25 17:04:02,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:04:02,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 72.98049 ± 44.268
2026-01-25 17:04:02,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [31.310736, 172.10231, 53.408833, 31.685282, 139.51353, 58.696625, 77.953896, 45.87448, 49.637672, 69.621574]
2026-01-25 17:04:02,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [23.0, 79.0, 35.0, 23.0, 71.0, 35.0, 44.0, 28.0, 30.0, 42.0]
2026-01-25 17:04:02,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (72.98) for latency DatasetOffice
2026-01-25 17:04:02,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 9 minutes, 2 seconds)
2026-01-25 17:05:28,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:05:29,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 206.05331 ± 3.126
2026-01-25 17:05:29,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [201.63632, 210.41643, 204.30684, 204.90079, 207.73695, 200.55379, 206.28214, 207.00761, 210.44743, 207.24487]
2026-01-25 17:05:29,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 93.0, 91.0, 90.0, 94.0, 90.0, 91.0, 91.0, 95.0, 93.0]
2026-01-25 17:05:29,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (206.05) for latency DatasetOffice
2026-01-25 17:05:29,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 14 minutes, 11 seconds)
2026-01-25 17:06:53,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:06:54,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 171.30150 ± 70.682
2026-01-25 17:06:54,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [216.73962, 218.17964, 205.8479, 169.35258, 45.657623, 205.69081, 19.970963, 212.54727, 211.56285, 207.4657]
2026-01-25 17:06:54,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [94.0, 95.0, 90.0, 78.0, 32.0, 92.0, 18.0, 93.0, 93.0, 91.0]
2026-01-25 17:06:54,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 14 minutes, 34 seconds)
2026-01-25 17:08:19,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:08:19,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 191.38213 ± 15.940
2026-01-25 17:08:19,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [212.38617, 186.03276, 177.74219, 171.29285, 173.03639, 212.04451, 184.26419, 181.14307, 210.60733, 205.2719]
2026-01-25 17:08:19,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 85.0, 82.0, 82.0, 81.0, 93.0, 84.0, 83.0, 93.0, 91.0]
2026-01-25 17:08:19,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 14 minutes, 1 second)
2026-01-25 17:09:45,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:09:45,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 208.71053 ± 4.064
2026-01-25 17:09:45,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [210.76039, 212.15466, 205.83528, 201.3185, 205.79225, 205.30005, 207.62096, 213.32712, 215.21944, 209.7767]
2026-01-25 17:09:45,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [92.0, 93.0, 90.0, 89.0, 90.0, 90.0, 91.0, 93.0, 94.0, 92.0]
2026-01-25 17:09:45,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (208.71) for latency DatasetOffice
2026-01-25 17:09:45,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 13 minutes, 22 seconds)
2026-01-25 17:11:10,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:11:10,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 203.79781 ± 2.141
2026-01-25 17:11:10,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [203.9291, 203.32362, 201.75735, 199.31335, 203.09229, 204.51134, 204.74348, 203.63017, 206.17148, 207.5058]
2026-01-25 17:11:10,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 90.0, 89.0, 88.0, 89.0, 90.0, 90.0, 90.0, 91.0, 91.0]
2026-01-25 17:11:10,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 14 minutes, 2 seconds)
2026-01-25 17:12:35,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:12:35,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 204.63141 ± 4.706
2026-01-25 17:12:35,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [203.66805, 212.00803, 207.57541, 210.45277, 199.66618, 198.58322, 207.2892, 197.44955, 206.15343, 203.46829]
2026-01-25 17:12:35,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 93.0, 91.0, 92.0, 88.0, 88.0, 91.0, 88.0, 91.0, 90.0]
2026-01-25 17:12:35,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 12 minutes, 19 seconds)
2026-01-25 17:14:00,290 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:14:00,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 187.05734 ± 60.255
2026-01-25 17:14:00,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [200.77333, 203.16335, 208.10233, 6.83961, 203.97467, 210.29321, 212.82184, 200.12791, 214.35039, 210.12701]
2026-01-25 17:14:00,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [89.0, 90.0, 91.0, 8.0, 88.0, 92.0, 93.0, 89.0, 94.0, 92.0]
2026-01-25 17:14:00,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 10 minutes, 46 seconds)
2026-01-25 17:15:25,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:15:26,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 183.92944 ± 53.573
2026-01-25 17:15:26,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [206.91815, 211.08418, 212.2669, 209.39255, 201.0911, 209.13345, 209.27057, 208.70782, 133.17448, 38.255222]
2026-01-25 17:15:26,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 92.0, 92.0, 91.0, 88.0, 91.0, 91.0, 91.0, 67.0, 27.0]
2026-01-25 17:15:26,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 9 minutes, 24 seconds)
2026-01-25 17:16:50,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:16:51,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 198.13466 ± 4.537
2026-01-25 17:16:51,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [199.93008, 194.93571, 192.32204, 199.4582, 194.40738, 196.0048, 205.01222, 196.03305, 207.20534, 196.03772]
2026-01-25 17:16:51,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [88.0, 87.0, 85.0, 88.0, 86.0, 87.0, 90.0, 87.0, 91.0, 87.0]
2026-01-25 17:16:51,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 7 minutes, 38 seconds)
2026-01-25 17:18:15,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:18:16,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 207.35434 ± 3.759
2026-01-25 17:18:16,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [205.0612, 204.1348, 209.06944, 206.07956, 212.79262, 214.79182, 204.91597, 205.53833, 208.79456, 202.36525]
2026-01-25 17:18:16,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 90.0, 91.0, 90.0, 92.0, 93.0, 90.0, 90.0, 92.0, 89.0]
2026-01-25 17:18:16,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 6 minutes, 21 seconds)
2026-01-25 17:19:41,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:19:42,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 184.88878 ± 51.626
2026-01-25 17:19:42,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [206.7151, 205.43555, 194.83401, 190.79121, 201.62053, 200.1146, 209.60323, 208.18109, 30.92244, 200.67001]
2026-01-25 17:19:42,328 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 90.0, 86.0, 85.0, 88.0, 88.0, 91.0, 91.0, 51.0, 88.0]
2026-01-25 17:19:42,333 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 5 minutes, 4 seconds)
2026-01-25 17:21:06,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:21:06,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 202.70183 ± 5.871
2026-01-25 17:21:06,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [202.73915, 210.36833, 203.35341, 190.51074, 198.02222, 200.25, 210.79128, 203.16144, 199.41951, 208.40228]
2026-01-25 17:21:06,926 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [89.0, 92.0, 89.0, 85.0, 87.0, 88.0, 92.0, 89.0, 88.0, 91.0]
2026-01-25 17:21:06,933 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 3 minutes, 32 seconds)
2026-01-25 17:22:31,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:22:32,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 203.49016 ± 10.199
2026-01-25 17:22:32,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [208.68417, 200.75664, 196.67941, 181.06499, 194.27792, 213.52979, 201.67798, 210.29524, 214.90305, 213.03249]
2026-01-25 17:22:32,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [91.0, 89.0, 87.0, 83.0, 86.0, 93.0, 89.0, 92.0, 94.0, 93.0]
2026-01-25 17:22:32,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 2 minutes)
2026-01-25 17:23:56,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:23:57,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 197.14597 ± 19.177
2026-01-25 17:23:57,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [206.3914, 211.1134, 203.85416, 204.58586, 207.87424, 161.40842, 156.84459, 207.60468, 208.89037, 202.89246]
2026-01-25 17:23:57,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 92.0, 89.0, 89.0, 90.0, 79.0, 79.0, 91.0, 91.0, 89.0]
2026-01-25 17:23:57,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 45 seconds)
2026-01-25 17:25:23,074 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:25:23,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 206.03418 ± 5.609
2026-01-25 17:25:23,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [205.06927, 210.27528, 211.2123, 208.68423, 206.98132, 205.5799, 210.30734, 210.55219, 193.17398, 198.50609]
2026-01-25 17:25:23,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [91.0, 92.0, 92.0, 91.0, 90.0, 90.0, 92.0, 92.0, 86.0, 88.0]
2026-01-25 17:25:23,811 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 1 hour, 59 minutes, 35 seconds)
2026-01-25 17:26:48,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:26:49,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 208.02991 ± 4.855
2026-01-25 17:26:49,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [206.11122, 215.63736, 211.97858, 198.58644, 204.90553, 212.23834, 202.26625, 208.48787, 209.61983, 210.46756]
2026-01-25 17:26:49,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [91.0, 94.0, 92.0, 88.0, 90.0, 92.0, 89.0, 92.0, 92.0, 92.0]
2026-01-25 17:26:49,336 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 1 hour, 58 minutes, 8 seconds)
2026-01-25 17:28:14,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:28:14,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 122.39085 ± 91.518
2026-01-25 17:28:14,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [20.692314, 55.484474, 24.866488, 29.244835, 26.432802, 206.77, 211.73502, 214.86151, 220.02567, 213.79547]
2026-01-25 17:28:14,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [20.0, 42.0, 20.0, 23.0, 24.0, 90.0, 92.0, 93.0, 95.0, 93.0]
2026-01-25 17:28:14,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 1 hour, 56 minutes, 54 seconds)
2026-01-25 17:29:39,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:29:39,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 211.47366 ± 6.900
2026-01-25 17:29:39,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [211.7171, 219.73993, 197.13655, 213.73062, 207.4352, 202.9343, 215.97586, 215.31493, 219.9806, 210.77138]
2026-01-25 17:29:39,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [92.0, 95.0, 87.0, 92.0, 90.0, 89.0, 93.0, 93.0, 95.0, 91.0]
2026-01-25 17:29:39,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (211.47) for latency DatasetOffice
2026-01-25 17:29:39,886 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 1 hour, 55 minutes, 31 seconds)
2026-01-25 17:31:04,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:31:04,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 214.43283 ± 3.896
2026-01-25 17:31:04,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [214.50168, 217.15457, 209.4312, 215.28767, 212.4041, 218.52428, 217.61432, 205.80359, 215.82307, 217.78384]
2026-01-25 17:31:04,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 94.0, 91.0, 93.0, 92.0, 94.0, 94.0, 90.0, 94.0, 94.0]
2026-01-25 17:31:04,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (214.43) for latency DatasetOffice
2026-01-25 17:31:04,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 1 hour, 53 minutes, 58 seconds)
2026-01-25 17:32:29,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:32:30,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 212.39690 ± 6.243
2026-01-25 17:32:30,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [214.74608, 216.79434, 209.61876, 215.80379, 213.48582, 209.07504, 195.67332, 214.45317, 218.58862, 215.73001]
2026-01-25 17:32:30,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 94.0, 91.0, 93.0, 92.0, 91.0, 87.0, 93.0, 95.0, 93.0]
2026-01-25 17:32:30,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 1 hour, 52 minutes, 23 seconds)
2026-01-25 17:33:56,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:33:56,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 211.70288 ± 2.962
2026-01-25 17:33:56,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [212.469, 212.83144, 206.74506, 208.86668, 210.64496, 211.7032, 208.154, 213.90236, 216.1852, 215.52707]
2026-01-25 17:33:56,873 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [92.0, 93.0, 90.0, 91.0, 91.0, 92.0, 91.0, 93.0, 94.0, 93.0]
2026-01-25 17:33:56,877 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 1 hour, 51 minutes, 9 seconds)
2026-01-25 17:35:20,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:35:21,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 216.26094 ± 2.888
2026-01-25 17:35:21,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [210.43391, 220.05038, 213.75452, 215.39792, 212.84326, 218.06961, 218.21666, 217.3403, 217.93083, 218.57193]
2026-01-25 17:35:21,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [91.0, 95.0, 92.0, 93.0, 92.0, 94.0, 94.0, 94.0, 94.0, 94.0]
2026-01-25 17:35:21,565 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (216.26) for latency DatasetOffice
2026-01-25 17:35:21,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 1 hour, 49 minutes, 34 seconds)
2026-01-25 17:36:46,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:36:47,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 213.05374 ± 7.414
2026-01-25 17:36:47,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [219.19319, 218.40205, 215.16846, 207.73604, 215.30687, 219.96332, 216.4859, 194.36339, 216.10297, 207.81528]
2026-01-25 17:36:47,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [94.0, 94.0, 92.0, 90.0, 92.0, 94.0, 93.0, 86.0, 93.0, 90.0]
2026-01-25 17:36:47,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 1 hour, 48 minutes, 17 seconds)
2026-01-25 17:38:12,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:38:13,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 214.06238 ± 4.329
2026-01-25 17:38:13,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [213.0354, 217.987, 207.32512, 210.39734, 211.40717, 213.81845, 209.65488, 219.68788, 221.07977, 216.23068]
2026-01-25 17:38:13,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [92.0, 94.0, 90.0, 91.0, 91.0, 92.0, 91.0, 94.0, 95.0, 93.0]
2026-01-25 17:38:13,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 47 minutes, 3 seconds)
2026-01-25 17:39:37,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:39:38,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 193.44363 ± 43.283
2026-01-25 17:39:38,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [125.48568, 97.25345, 178.70996, 215.3334, 216.02545, 221.30853, 222.24005, 216.97495, 216.79843, 224.3064]
2026-01-25 17:39:38,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [64.0, 61.0, 87.0, 93.0, 93.0, 95.0, 95.0, 94.0, 94.0, 96.0]
2026-01-25 17:39:38,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 45 minutes, 30 seconds)
2026-01-25 17:41:02,759 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:41:03,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 198.16469 ± 24.591
2026-01-25 17:41:03,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [210.60228, 204.32584, 125.44179, 202.45454, 198.97133, 208.08904, 209.8072, 200.97221, 211.1696, 209.81303]
2026-01-25 17:41:03,457 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [91.0, 89.0, 71.0, 88.0, 87.0, 90.0, 91.0, 88.0, 92.0, 91.0]
2026-01-25 17:41:03,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 43 minutes, 48 seconds)
2026-01-25 17:42:27,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:42:28,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 213.30148 ± 2.908
2026-01-25 17:42:28,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [215.76897, 207.10846, 214.06111, 216.28645, 217.27837, 211.042, 212.99779, 210.47398, 214.58252, 213.41518]
2026-01-25 17:42:28,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 90.0, 92.0, 93.0, 93.0, 91.0, 92.0, 91.0, 93.0, 92.0]
2026-01-25 17:42:28,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 42 minutes, 23 seconds)
2026-01-25 17:43:52,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:43:53,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 210.45865 ± 4.012
2026-01-25 17:43:53,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [205.60223, 213.8898, 213.16508, 213.09407, 201.09116, 212.28514, 208.00754, 211.89107, 213.62341, 211.93697]
2026-01-25 17:43:53,395 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 93.0, 92.0, 92.0, 88.0, 92.0, 91.0, 92.0, 93.0, 92.0]
2026-01-25 17:43:53,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 40 minutes, 50 seconds)
2026-01-25 17:45:18,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:45:19,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 213.80624 ± 4.058
2026-01-25 17:45:19,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [218.02187, 214.51344, 207.93921, 210.63156, 214.09407, 218.82349, 213.1526, 206.50612, 217.17589, 217.20421]
2026-01-25 17:45:19,319 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [94.0, 93.0, 90.0, 91.0, 92.0, 94.0, 92.0, 90.0, 94.0, 94.0]
2026-01-25 17:45:19,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 39 minutes, 26 seconds)
2026-01-25 17:46:44,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:46:44,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 213.48093 ± 8.477
2026-01-25 17:46:44,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [216.18367, 222.485, 194.70581, 217.98512, 202.58005, 215.77934, 212.83871, 210.2186, 223.09508, 218.93788]
2026-01-25 17:46:44,857 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 95.0, 86.0, 93.0, 88.0, 93.0, 92.0, 91.0, 96.0, 94.0]
2026-01-25 17:46:44,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 38 minutes, 5 seconds)
2026-01-25 17:48:09,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:48:10,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 215.99214 ± 5.456
2026-01-25 17:48:10,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [224.85545, 222.08421, 214.03705, 213.05867, 214.9394, 215.17485, 203.37463, 216.70433, 218.62254, 217.07016]
2026-01-25 17:48:10,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [96.0, 95.0, 92.0, 92.0, 92.0, 92.0, 89.0, 93.0, 94.0, 93.0]
2026-01-25 17:48:10,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 36 minutes, 43 seconds)
2026-01-25 17:49:35,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:49:36,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 220.26704 ± 6.716
2026-01-25 17:49:36,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [213.35234, 215.61708, 218.18478, 214.54459, 222.7602, 226.73087, 210.50366, 226.72873, 221.40459, 232.84352]
2026-01-25 17:49:36,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [92.0, 93.0, 93.0, 92.0, 95.0, 96.0, 91.0, 96.0, 95.0, 98.0]
2026-01-25 17:49:36,058 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (220.27) for latency DatasetOffice
2026-01-25 17:49:36,081 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 35 minutes, 33 seconds)
2026-01-25 17:51:00,566 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:51:01,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 220.59663 ± 4.557
2026-01-25 17:51:01,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [223.43816, 218.18074, 224.33002, 211.87447, 218.4802, 227.69238, 217.15703, 216.89043, 224.54733, 223.37543]
2026-01-25 17:51:01,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [95.0, 94.0, 95.0, 91.0, 93.0, 96.0, 93.0, 93.0, 96.0, 95.0]
2026-01-25 17:51:01,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (220.60) for latency DatasetOffice
2026-01-25 17:51:01,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 34 minutes, 8 seconds)
2026-01-25 17:52:26,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:52:27,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 225.08443 ± 2.176
2026-01-25 17:52:27,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [227.1742, 229.42786, 225.69789, 224.25365, 223.75868, 220.86139, 225.96065, 223.83809, 225.68094, 224.19093]
2026-01-25 17:52:27,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [96.0, 98.0, 96.0, 95.0, 94.0, 94.0, 96.0, 95.0, 96.0, 95.0]
2026-01-25 17:52:27,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (225.08) for latency DatasetOffice
2026-01-25 17:52:27,584 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 32 minutes, 47 seconds)
2026-01-25 17:53:52,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:53:53,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 222.32002 ± 3.572
2026-01-25 17:53:53,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [227.87868, 221.91667, 219.91875, 223.86478, 220.17651, 224.41353, 219.05367, 216.33908, 221.62144, 228.01715]
2026-01-25 17:53:53,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [97.0, 95.0, 94.0, 95.0, 94.0, 96.0, 94.0, 93.0, 95.0, 97.0]
2026-01-25 17:53:53,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 31 minutes, 23 seconds)
2026-01-25 17:55:17,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:55:18,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 198.62778 ± 57.960
2026-01-25 17:55:18,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [233.12875, 219.67683, 213.91927, 229.40105, 227.29523, 134.12433, 46.34811, 235.85278, 221.22029, 225.31113]
2026-01-25 17:55:18,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [97.0, 95.0, 92.0, 96.0, 96.0, 69.0, 37.0, 117.0, 95.0, 96.0]
2026-01-25 17:55:18,697 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 29 minutes, 59 seconds)
2026-01-25 17:56:44,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:56:44,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 202.18298 ± 57.486
2026-01-25 17:56:44,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [217.38583, 222.50525, 224.41411, 228.20831, 223.97423, 219.0136, 30.721785, 206.85031, 230.16295, 218.59334]
2026-01-25 17:56:44,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 95.0, 95.0, 96.0, 95.0, 93.0, 23.0, 90.0, 98.0, 95.0]
2026-01-25 17:56:44,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 28 minutes, 35 seconds)
2026-01-25 17:58:10,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:58:10,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 192.56525 ± 91.900
2026-01-25 17:58:10,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [8.0459, 12.598593, 254.57253, 230.65587, 226.26102, 252.95766, 219.95514, 224.63701, 251.25427, 244.7145]
2026-01-25 17:58:10,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [9.0, 13.0, 123.0, 98.0, 98.0, 105.0, 96.0, 97.0, 104.0, 101.0]
2026-01-25 17:58:10,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 27 minutes, 21 seconds)
2026-01-25 17:59:35,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 17:59:36,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 231.47977 ± 8.698
2026-01-25 17:59:36,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [236.03104, 216.78297, 245.87544, 237.22453, 224.12881, 222.18893, 239.0161, 236.76369, 232.60333, 224.18297]
2026-01-25 17:59:36,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [101.0, 94.0, 103.0, 100.0, 95.0, 96.0, 100.0, 100.0, 100.0, 96.0]
2026-01-25 17:59:36,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (231.48) for latency DatasetOffice
2026-01-25 17:59:36,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 25 minutes, 50 seconds)
2026-01-25 18:01:01,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:01:02,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 236.71640 ± 11.830
2026-01-25 18:01:02,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [239.33105, 254.13983, 237.4854, 241.40285, 219.66524, 222.72173, 218.83545, 238.09784, 244.99353, 250.4913]
2026-01-25 18:01:02,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [102.0, 106.0, 100.0, 101.0, 95.0, 96.0, 95.0, 100.0, 103.0, 105.0]
2026-01-25 18:01:02,508 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (236.72) for latency DatasetOffice
2026-01-25 18:01:02,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 24 minutes, 24 seconds)
2026-01-25 18:02:26,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:02:27,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 241.47824 ± 79.233
2026-01-25 18:02:27,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [282.8539, 288.27255, 256.23654, 254.04453, 264.19504, 270.08673, 243.82005, 272.15973, 6.925408, 276.18832]
2026-01-25 18:02:27,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [118.0, 117.0, 112.0, 108.0, 113.0, 117.0, 102.0, 111.0, 8.0, 113.0]
2026-01-25 18:02:27,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (241.48) for latency DatasetOffice
2026-01-25 18:02:27,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 22 minutes, 57 seconds)
2026-01-25 18:03:52,242 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:03:53,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 260.12054 ± 11.121
2026-01-25 18:03:53,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [272.23926, 273.4382, 260.15118, 250.72997, 275.33353, 270.3892, 251.93872, 254.36198, 248.0392, 244.58434]
2026-01-25 18:03:53,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [112.0, 112.0, 108.0, 106.0, 112.0, 110.0, 106.0, 107.0, 105.0, 104.0]
2026-01-25 18:03:53,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (260.12) for latency DatasetOffice
2026-01-25 18:03:53,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 21 minutes, 23 seconds)
2026-01-25 18:05:17,727 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:05:18,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 264.95148 ± 14.841
2026-01-25 18:05:18,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [278.01276, 254.7894, 272.46796, 269.36752, 273.82227, 237.10005, 277.4703, 273.85687, 273.82288, 238.80493]
2026-01-25 18:05:18,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [114.0, 107.0, 112.0, 110.0, 112.0, 100.0, 114.0, 112.0, 113.0, 102.0]
2026-01-25 18:05:18,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (264.95) for latency DatasetOffice
2026-01-25 18:05:18,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 19 minutes, 49 seconds)
2026-01-25 18:06:42,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:06:43,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 211.68387 ± 97.329
2026-01-25 18:06:43,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [271.39835, 263.68332, 276.8166, 266.88086, 165.79483, 48.491283, 7.967073, 274.34268, 274.80832, 266.65524]
2026-01-25 18:06:43,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [113.0, 109.0, 112.0, 110.0, 78.0, 32.0, 9.0, 113.0, 113.0, 111.0]
2026-01-25 18:06:43,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 18 minutes, 13 seconds)
2026-01-25 18:08:08,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:08:08,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 248.51936 ± 68.103
2026-01-25 18:08:08,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [270.52515, 276.78076, 275.1856, 286.4331, 260.3194, 271.20392, 269.8311, 271.0692, 258.3898, 45.455517]
2026-01-25 18:08:08,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [112.0, 112.0, 113.0, 116.0, 109.0, 112.0, 112.0, 112.0, 109.0, 30.0]
2026-01-25 18:08:08,880 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 16 minutes, 44 seconds)
2026-01-25 18:09:33,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:09:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 226.04904 ± 110.204
2026-01-25 18:09:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [281.93808, 8.022508, 275.84772, 295.7052, 284.10062, 7.943327, 242.38724, 288.724, 306.1964, 269.62543]
2026-01-25 18:09:34,528 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [116.0, 9.0, 113.0, 119.0, 116.0, 9.0, 102.0, 118.0, 124.0, 112.0]
2026-01-25 18:09:34,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 15 minutes, 23 seconds)
2026-01-25 18:10:58,210 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:10:59,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 276.64313 ± 11.188
2026-01-25 18:10:59,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [301.77774, 284.65424, 271.12198, 271.7025, 274.71875, 280.5821, 276.3475, 254.78288, 274.9164, 275.82748]
2026-01-25 18:10:59,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [123.0, 117.0, 111.0, 114.0, 114.0, 116.0, 111.0, 106.0, 113.0, 114.0]
2026-01-25 18:10:59,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (276.64) for latency DatasetOffice
2026-01-25 18:10:59,136 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 13 minutes, 50 seconds)
2026-01-25 18:12:25,261 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:12:26,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 218.69104 ± 102.203
2026-01-25 18:12:26,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [268.6936, 261.53247, 224.66235, 19.507334, 15.602424, 255.6179, 288.32233, 280.58484, 284.26154, 288.12573]
2026-01-25 18:12:26,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [113.0, 110.0, 99.0, 18.0, 15.0, 107.0, 119.0, 116.0, 118.0, 117.0]
2026-01-25 18:12:26,018 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 12 minutes, 39 seconds)
2026-01-25 18:13:50,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:13:50,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 287.59167 ± 4.765
2026-01-25 18:13:50,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [294.37576, 288.8631, 288.76807, 282.85812, 277.7735, 293.67624, 284.27664, 289.47327, 285.9959, 289.85617]
2026-01-25 18:13:50,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [118.0, 117.0, 117.0, 115.0, 115.0, 119.0, 118.0, 117.0, 117.0, 117.0]
2026-01-25 18:13:50,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (287.59) for latency DatasetOffice
2026-01-25 18:13:51,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 11 minutes, 15 seconds)
2026-01-25 18:15:16,007 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:15:16,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 280.90228 ± 8.299
2026-01-25 18:15:16,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [292.25226, 281.2544, 282.45892, 284.39426, 274.03033, 275.8365, 292.98294, 266.69705, 271.78275, 287.3336]
2026-01-25 18:15:16,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [115.0, 113.0, 112.0, 113.0, 110.0, 110.0, 116.0, 108.0, 111.0, 114.0]
2026-01-25 18:15:16,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 9 minutes, 54 seconds)
2026-01-25 18:16:41,254 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:16:42,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 282.77863 ± 12.163
2026-01-25 18:16:42,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [284.04053, 300.7283, 259.32657, 296.2013, 272.34155, 268.78958, 280.75714, 287.8215, 291.7669, 286.0132]
2026-01-25 18:16:42,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [116.0, 120.0, 108.0, 118.0, 110.0, 110.0, 114.0, 115.0, 117.0, 115.0]
2026-01-25 18:16:42,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 8 minutes, 25 seconds)
2026-01-25 18:18:06,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:18:07,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 278.80789 ± 13.969
2026-01-25 18:18:07,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [285.38635, 285.6205, 289.96344, 287.79022, 270.01685, 292.14304, 282.3278, 242.80159, 269.80795, 282.2212]
2026-01-25 18:18:07,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [115.0, 116.0, 117.0, 115.0, 111.0, 117.0, 115.0, 101.0, 111.0, 115.0]
2026-01-25 18:18:07,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 7 minutes, 7 seconds)
2026-01-25 18:19:32,692 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:19:33,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 261.39481 ± 69.596
2026-01-25 18:19:33,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [285.65512, 296.16473, 295.12512, 289.60684, 264.9333, 281.20038, 282.84433, 279.37982, 284.92017, 54.118168]
2026-01-25 18:19:33,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [115.0, 118.0, 117.0, 117.0, 108.0, 115.0, 116.0, 114.0, 117.0, 35.0]
2026-01-25 18:19:33,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 5 minutes, 33 seconds)
2026-01-25 18:20:57,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:20:58,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 277.01141 ± 4.416
2026-01-25 18:20:58,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [276.1325, 283.90033, 269.05942, 280.20737, 276.48227, 277.21408, 275.94333, 274.5002, 272.76675, 283.90805]
2026-01-25 18:20:58,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [113.0, 114.0, 111.0, 114.0, 114.0, 113.0, 113.0, 112.0, 111.0, 116.0]
2026-01-25 18:20:58,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 4 minutes, 10 seconds)
2026-01-25 18:22:22,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:22:23,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 274.59253 ± 49.913
2026-01-25 18:22:23,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [289.34634, 295.24097, 278.94043, 126.545456, 281.98367, 304.52853, 284.87704, 289.17383, 297.06107, 298.22794]
2026-01-25 18:22:23,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [118.0, 119.0, 114.0, 64.0, 115.0, 128.0, 117.0, 117.0, 119.0, 121.0]
2026-01-25 18:22:23,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 2 minutes, 33 seconds)
2026-01-25 18:23:47,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:23:48,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 282.91647 ± 5.428
2026-01-25 18:23:48,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [285.79935, 285.4858, 275.62064, 284.34454, 276.1382, 280.60596, 284.94446, 287.3042, 292.97528, 275.94635]
2026-01-25 18:23:48,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [116.0, 117.0, 114.0, 115.0, 111.0, 115.0, 116.0, 117.0, 118.0, 113.0]
2026-01-25 18:23:48,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 1 minute, 6 seconds)
2026-01-25 18:25:13,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:25:14,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 284.55325 ± 5.628
2026-01-25 18:25:14,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [288.82193, 278.9424, 288.63586, 291.6476, 274.79495, 287.66025, 282.60785, 278.76492, 281.95325, 291.70337]
2026-01-25 18:25:14,181 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [117.0, 113.0, 116.0, 117.0, 112.0, 116.0, 114.0, 113.0, 115.0, 117.0]
2026-01-25 18:25:14,190 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 59 minutes, 43 seconds)
2026-01-25 18:26:39,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:26:40,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 293.41440 ± 4.150
2026-01-25 18:26:40,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [293.84448, 295.34747, 294.67343, 294.29968, 289.35568, 289.70206, 292.56763, 289.36337, 290.9649, 304.0253]
2026-01-25 18:26:40,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [118.0, 120.0, 119.0, 118.0, 117.0, 116.0, 118.0, 117.0, 119.0, 122.0]
2026-01-25 18:26:40,737 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (293.41) for latency DatasetOffice
2026-01-25 18:26:40,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 58 minutes, 23 seconds)
2026-01-25 18:28:05,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:28:06,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 277.28281 ± 5.605
2026-01-25 18:28:06,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [268.75323, 279.30414, 277.41934, 275.94998, 278.13586, 272.88425, 277.26184, 278.0399, 291.47382, 273.60562]
2026-01-25 18:28:06,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [111.0, 115.0, 113.0, 113.0, 114.0, 111.0, 113.0, 113.0, 119.0, 113.0]
2026-01-25 18:28:06,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 57 minutes, 1 second)
2026-01-25 18:29:30,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:29:31,752 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 250.37427 ± 94.285
2026-01-25 18:29:31,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [122.46834, 14.29067, 297.34387, 291.12546, 292.73837, 291.2635, 307.9466, 298.99, 297.01624, 290.5596]
2026-01-25 18:29:31,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [63.0, 15.0, 120.0, 117.0, 119.0, 118.0, 123.0, 121.0, 121.0, 119.0]
2026-01-25 18:29:31,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 55 minutes, 40 seconds)
2026-01-25 18:30:57,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:30:58,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 296.52069 ± 8.624
2026-01-25 18:30:58,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [304.69305, 305.84686, 283.1805, 292.86813, 294.35837, 300.29684, 293.69455, 281.22507, 302.31973, 306.72345]
2026-01-25 18:30:58,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [122.0, 123.0, 115.0, 119.0, 119.0, 121.0, 119.0, 116.0, 123.0, 122.0]
2026-01-25 18:30:58,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (296.52) for latency DatasetOffice
2026-01-25 18:30:58,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 54 minutes, 25 seconds)
2026-01-25 18:32:22,905 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:32:23,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 290.05664 ± 4.943
2026-01-25 18:32:23,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [287.02545, 291.8852, 294.30972, 280.63898, 292.3446, 284.80527, 293.5891, 285.3784, 294.07736, 296.51224]
2026-01-25 18:32:23,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [117.0, 119.0, 119.0, 116.0, 117.0, 116.0, 119.0, 117.0, 120.0, 120.0]
2026-01-25 18:32:23,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 52 minutes, 59 seconds)
2026-01-25 18:33:48,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:33:49,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 284.53278 ± 9.041
2026-01-25 18:33:49,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [294.3746, 284.719, 272.1706, 266.1803, 287.72388, 280.19122, 282.32825, 292.9462, 293.78006, 290.9135]
2026-01-25 18:33:49,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [119.0, 117.0, 110.0, 110.0, 117.0, 113.0, 115.0, 119.0, 119.0, 118.0]
2026-01-25 18:33:49,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 51 minutes, 27 seconds)
2026-01-25 18:35:14,308 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:35:15,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 284.79770 ± 13.737
2026-01-25 18:35:15,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [286.08716, 288.81006, 288.4573, 295.56668, 288.63867, 287.06454, 288.89276, 244.21933, 290.202, 290.03882]
2026-01-25 18:35:15,229 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [115.0, 117.0, 116.0, 120.0, 116.0, 117.0, 118.0, 103.0, 118.0, 118.0]
2026-01-25 18:35:15,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 50 minutes)
2026-01-25 18:36:40,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:36:41,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 292.52728 ± 7.777
2026-01-25 18:36:41,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [291.05917, 302.8563, 284.19037, 296.55658, 299.11444, 276.10852, 298.02014, 294.45825, 285.93002, 296.97906]
2026-01-25 18:36:41,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [117.0, 123.0, 116.0, 119.0, 119.0, 112.0, 120.0, 119.0, 118.0, 120.0]
2026-01-25 18:36:41,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 48 minutes, 39 seconds)
2026-01-25 18:38:06,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:38:07,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 282.96942 ± 37.080
2026-01-25 18:38:07,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [171.94035, 297.48105, 289.19254, 294.74396, 294.98416, 296.76593, 295.8402, 294.93185, 296.02866, 297.7856]
2026-01-25 18:38:07,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [80.0, 120.0, 116.0, 116.0, 118.0, 117.0, 118.0, 118.0, 120.0, 120.0]
2026-01-25 18:38:07,761 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 47 minutes, 15 seconds)
2026-01-25 18:39:31,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:39:32,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 293.41718 ± 4.765
2026-01-25 18:39:32,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [293.12912, 287.152, 297.04138, 287.61038, 292.42484, 295.68665, 292.81104, 290.10452, 293.68875, 304.523]
2026-01-25 18:39:32,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [119.0, 117.0, 119.0, 115.0, 117.0, 118.0, 118.0, 116.0, 119.0, 121.0]
2026-01-25 18:39:32,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 45 minutes, 45 seconds)
2026-01-25 18:40:57,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:40:58,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 291.53470 ± 5.684
2026-01-25 18:40:58,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [292.86578, 299.15384, 293.91983, 286.10275, 289.06546, 287.72562, 288.75165, 293.0645, 282.44788, 302.24966]
2026-01-25 18:40:58,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [119.0, 121.0, 117.0, 115.0, 115.0, 117.0, 116.0, 116.0, 114.0, 121.0]
2026-01-25 18:40:58,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 44 minutes, 17 seconds)
2026-01-25 18:42:21,757 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:42:22,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 241.33749 ± 101.627
2026-01-25 18:42:22,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [293.18402, 289.96777, 280.6642, 30.223429, 46.629135, 290.5848, 295.6518, 299.72852, 294.0298, 292.71133]
2026-01-25 18:42:22,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [117.0, 116.0, 113.0, 22.0, 28.0, 116.0, 118.0, 120.0, 119.0, 117.0]
2026-01-25 18:42:22,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 42 minutes, 43 seconds)
2026-01-25 18:43:45,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:43:46,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 296.64914 ± 3.912
2026-01-25 18:43:46,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [295.273, 293.92358, 297.23245, 292.20535, 299.87863, 296.6187, 304.35068, 294.55527, 291.2719, 301.18198]
2026-01-25 18:43:46,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [120.0, 120.0, 118.0, 120.0, 120.0, 120.0, 121.0, 118.0, 119.0, 120.0]
2026-01-25 18:43:46,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (296.65) for latency DatasetOffice
2026-01-25 18:43:46,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 41 minutes, 8 seconds)
2026-01-25 18:45:10,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:45:11,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 300.49966 ± 5.185
2026-01-25 18:45:11,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [309.36792, 308.5601, 291.81854, 302.54745, 294.30505, 300.77396, 299.2095, 300.31088, 299.87027, 298.23288]
2026-01-25 18:45:11,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [124.0, 121.0, 119.0, 121.0, 119.0, 120.0, 118.0, 121.0, 122.0, 118.0]
2026-01-25 18:45:11,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (300.50) for latency DatasetOffice
2026-01-25 18:45:11,159 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 39 minutes, 31 seconds)
2026-01-25 18:46:33,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:46:34,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 303.17480 ± 4.979
2026-01-25 18:46:34,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [311.03528, 299.37808, 297.41898, 309.1609, 309.99808, 303.55402, 297.56424, 303.7722, 298.66812, 301.198]
2026-01-25 18:46:34,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [123.0, 120.0, 118.0, 123.0, 121.0, 122.0, 120.0, 121.0, 121.0, 121.0]
2026-01-25 18:46:34,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (303.17) for latency DatasetOffice
2026-01-25 18:46:34,772 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 37 minutes, 58 seconds)
2026-01-25 18:47:58,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:47:59,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 305.16278 ± 8.978
2026-01-25 18:47:59,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [306.13998, 309.38763, 291.01086, 307.91235, 292.60367, 319.4961, 297.20462, 301.3612, 311.33652, 315.17517]
2026-01-25 18:47:59,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [122.0, 123.0, 118.0, 122.0, 117.0, 123.0, 120.0, 118.0, 123.0, 123.0]
2026-01-25 18:47:59,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (305.16) for latency DatasetOffice
2026-01-25 18:47:59,667 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 36 minutes, 31 seconds)
2026-01-25 18:49:21,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:49:22,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 302.98517 ± 11.402
2026-01-25 18:49:22,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [282.4055, 293.23685, 317.9713, 299.84488, 308.2774, 311.2559, 310.18808, 286.95868, 304.98904, 314.7241]
2026-01-25 18:49:22,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [118.0, 120.0, 126.0, 120.0, 124.0, 125.0, 125.0, 116.0, 123.0, 126.0]
2026-01-25 18:49:22,774 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 35 minutes, 1 second)
2026-01-25 18:50:45,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:50:46,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 300.52975 ± 6.919
2026-01-25 18:50:46,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [304.42496, 298.50735, 301.75867, 295.4419, 289.93262, 304.5502, 307.11978, 303.22348, 311.40988, 288.92865]
2026-01-25 18:50:46,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [122.0, 120.0, 120.0, 119.0, 118.0, 123.0, 124.0, 122.0, 125.0, 117.0]
2026-01-25 18:50:46,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 33 minutes, 35 seconds)
2026-01-25 18:52:08,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:52:09,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 229.03909 ± 114.135
2026-01-25 18:52:09,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [45.227734, 298.30576, 95.05162, 29.03678, 305.6248, 299.25653, 303.331, 294.4335, 312.0616, 308.06143]
2026-01-25 18:52:09,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [31.0, 121.0, 54.0, 29.0, 123.0, 120.0, 122.0, 118.0, 123.0, 123.0]
2026-01-25 18:52:09,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 32 minutes, 2 seconds)
2026-01-25 18:53:31,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:53:32,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 272.87976 ± 77.488
2026-01-25 18:53:32,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [40.51561, 299.0086, 294.8976, 297.01773, 300.45694, 297.80267, 303.70786, 297.86197, 297.27655, 300.25198]
2026-01-25 18:53:32,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [28.0, 121.0, 118.0, 120.0, 120.0, 121.0, 122.0, 120.0, 120.0, 120.0]
2026-01-25 18:53:32,574 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 30 minutes, 38 seconds)
2026-01-25 18:54:56,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:54:57,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 273.32208 ± 80.489
2026-01-25 18:54:57,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [306.0857, 225.63007, 303.02158, 295.2676, 304.0987, 309.92276, 316.56293, 44.063267, 313.4782, 315.0901]
2026-01-25 18:54:57,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [121.0, 100.0, 120.0, 120.0, 119.0, 122.0, 125.0, 26.0, 123.0, 123.0]
2026-01-25 18:54:57,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 29 minutes, 13 seconds)
2026-01-25 18:56:27,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:56:28,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 301.57974 ± 4.779
2026-01-25 18:56:28,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [309.06635, 306.39957, 306.26218, 301.84558, 301.8812, 300.26318, 294.91595, 296.84683, 303.9947, 294.3218]
2026-01-25 18:56:28,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [124.0, 122.0, 123.0, 121.0, 121.0, 121.0, 119.0, 121.0, 122.0, 120.0]
2026-01-25 18:56:28,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 28 minutes, 23 seconds)
2026-01-25 18:57:58,537 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:57:59,525 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 310.83801 ± 8.321
2026-01-25 18:57:59,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [307.32355, 317.07928, 302.87717, 319.28137, 313.44217, 326.2638, 311.21396, 295.8881, 304.687, 310.3238]
2026-01-25 18:57:59,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [123.0, 126.0, 121.0, 126.0, 124.0, 127.0, 124.0, 120.0, 123.0, 124.0]
2026-01-25 18:57:59,526 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (310.84) for latency DatasetOffice
2026-01-25 18:57:59,533 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 27 minutes, 25 seconds)
2026-01-25 18:59:30,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 18:59:31,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 301.41217 ± 5.167
2026-01-25 18:59:31,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [305.49078, 308.7444, 303.2687, 291.7466, 305.58005, 295.1511, 299.46304, 296.49902, 303.43765, 304.74036]
2026-01-25 18:59:31,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [123.0, 123.0, 120.0, 119.0, 121.0, 120.0, 121.0, 121.0, 122.0, 122.0]
2026-01-25 18:59:31,267 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 26 minutes, 31 seconds)
2026-01-25 19:00:56,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:00:57,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 318.40530 ± 6.088
2026-01-25 19:00:57,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [321.9378, 324.35226, 318.40546, 315.81525, 307.17004, 307.5619, 323.27872, 322.48627, 324.0215, 319.02368]
2026-01-25 19:00:57,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [126.0, 126.0, 123.0, 123.0, 122.0, 121.0, 125.0, 125.0, 126.0, 124.0]
2026-01-25 19:00:57,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (318.41) for latency DatasetOffice
2026-01-25 19:00:57,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 25 minutes, 11 seconds)
2026-01-25 19:02:18,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:02:19,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 304.24387 ± 6.509
2026-01-25 19:02:19,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [313.30066, 304.69977, 309.1202, 290.50293, 307.4186, 306.933, 302.34323, 294.7291, 305.06693, 308.32452]
2026-01-25 19:02:19,909 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [125.0, 122.0, 124.0, 119.0, 122.0, 123.0, 122.0, 117.0, 124.0, 123.0]
2026-01-25 19:02:19,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 23 minutes, 37 seconds)
2026-01-25 19:03:42,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:03:43,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 291.88657 ± 11.002
2026-01-25 19:03:43,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [305.2168, 282.55533, 294.91708, 281.00964, 289.8616, 285.00876, 314.55896, 300.5292, 280.9083, 284.30038]
2026-01-25 19:03:43,345 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [121.0, 114.0, 119.0, 113.0, 117.0, 116.0, 124.0, 120.0, 114.0, 115.0]
2026-01-25 19:03:43,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 21 minutes, 44 seconds)
2026-01-25 19:05:05,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:05:06,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 252.91035 ± 108.707
2026-01-25 19:05:06,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [305.54913, 313.33838, 313.91904, 301.837, 44.389286, 27.18871, 305.2238, 306.6675, 300.69208, 310.2986]
2026-01-25 19:05:06,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [122.0, 124.0, 123.0, 120.0, 30.0, 34.0, 121.0, 122.0, 121.0, 123.0]
2026-01-25 19:05:06,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 19 minutes, 56 seconds)
2026-01-25 19:06:29,485 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:06:30,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 291.28506 ± 10.954
2026-01-25 19:06:30,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [296.19553, 268.87256, 272.09933, 295.93347, 299.37262, 303.6602, 298.76944, 293.98257, 291.25696, 292.70813]
2026-01-25 19:06:30,426 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [122.0, 111.0, 118.0, 120.0, 122.0, 124.0, 122.0, 120.0, 120.0, 120.0]
2026-01-25 19:06:30,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 18 minutes, 9 seconds)
2026-01-25 19:07:53,861 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:07:54,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 274.62662 ± 80.088
2026-01-25 19:07:54,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [307.6584, 301.69812, 295.8987, 315.1417, 302.64777, 298.31885, 291.2196, 301.788, 35.10309, 296.7921]
2026-01-25 19:07:54,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [123.0, 123.0, 121.0, 126.0, 122.0, 121.0, 121.0, 122.0, 30.0, 121.0]
2026-01-25 19:07:54,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 16 minutes, 42 seconds)
2026-01-25 19:09:17,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:09:17,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 317.84158 ± 6.279
2026-01-25 19:09:17,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [319.66895, 321.45755, 304.27582, 322.17236, 320.03958, 322.16733, 326.4681, 318.5166, 310.37625, 313.27316]
2026-01-25 19:09:17,989 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [125.0, 125.0, 119.0, 126.0, 125.0, 126.0, 127.0, 124.0, 123.0, 124.0]
2026-01-25 19:09:17,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 15 minutes, 19 seconds)
2026-01-25 19:10:40,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:10:41,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 316.99088 ± 4.488
2026-01-25 19:10:41,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [317.81235, 323.27335, 314.3461, 321.18698, 315.19888, 312.39542, 312.63953, 309.96454, 322.67426, 320.41733]
2026-01-25 19:10:41,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [126.0, 129.0, 125.0, 127.0, 125.0, 126.0, 126.0, 125.0, 128.0, 127.0]
2026-01-25 19:10:41,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 13 minutes, 55 seconds)
2026-01-25 19:12:04,249 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:12:05,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 320.36154 ± 4.722
2026-01-25 19:12:05,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [326.3805, 323.2674, 319.16534, 321.4062, 313.23465, 315.14062, 318.9821, 315.42133, 321.9579, 328.65936]
2026-01-25 19:12:05,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [129.0, 130.0, 127.0, 127.0, 126.0, 127.0, 128.0, 126.0, 130.0, 129.0]
2026-01-25 19:12:05,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (320.36) for latency DatasetOffice
2026-01-25 19:12:05,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 12 minutes, 33 seconds)
2026-01-25 19:13:27,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:13:28,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 328.16162 ± 5.706
2026-01-25 19:13:28,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [324.8178, 332.25064, 323.26993, 327.09808, 324.83197, 323.83392, 329.81995, 326.5634, 325.7483, 343.38214]
2026-01-25 19:13:28,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [129.0, 129.0, 128.0, 129.0, 127.0, 129.0, 129.0, 129.0, 130.0, 132.0]
2026-01-25 19:13:28,595 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (328.16) for latency DatasetOffice
2026-01-25 19:13:28,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 11 minutes, 9 seconds)
2026-01-25 19:14:51,021 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:14:51,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 311.88632 ± 16.126
2026-01-25 19:14:51,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [320.39215, 324.74323, 309.54865, 320.2961, 313.19385, 322.8663, 328.7535, 298.8522, 309.6223, 270.59488]
2026-01-25 19:14:51,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [126.0, 129.0, 122.0, 128.0, 126.0, 130.0, 130.0, 121.0, 125.0, 108.0]
2026-01-25 19:14:51,998 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 9 minutes, 44 seconds)
2026-01-25 19:16:15,280 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:16:16,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 319.27582 ± 6.477
2026-01-25 19:16:16,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [318.19293, 321.79932, 326.47302, 303.2275, 311.87717, 321.72733, 322.28662, 320.8318, 323.39133, 322.9513]
2026-01-25 19:16:16,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [128.0, 130.0, 129.0, 122.0, 126.0, 130.0, 130.0, 129.0, 131.0, 129.0]
2026-01-25 19:16:16,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 8 minutes, 21 seconds)
2026-01-25 19:17:38,443 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:17:39,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 240.80547 ± 125.501
2026-01-25 19:17:39,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [30.176352, 322.78284, 106.46714, 19.102886, 320.25537, 318.25385, 319.0753, 320.67786, 322.90186, 328.36108]
2026-01-25 19:17:39,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [24.0, 128.0, 59.0, 17.0, 127.0, 127.0, 127.0, 126.0, 127.0, 128.0]
2026-01-25 19:17:39,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 6 minutes, 57 seconds)
2026-01-25 19:19:01,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:19:02,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 334.33069 ± 8.491
2026-01-25 19:19:02,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [314.758, 336.82095, 336.6887, 329.96127, 325.56042, 334.94754, 343.06943, 345.5916, 338.37268, 337.53638]
2026-01-25 19:19:02,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [125.0, 128.0, 128.0, 126.0, 126.0, 128.0, 131.0, 131.0, 131.0, 130.0]
2026-01-25 19:19:02,961 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (334.33) for latency DatasetOffice
2026-01-25 19:19:02,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 5 minutes, 34 seconds)
2026-01-25 19:20:26,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:20:27,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 314.55615 ± 44.629
2026-01-25 19:20:27,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [328.03064, 331.39407, 318.7837, 327.44534, 328.56177, 181.25117, 335.175, 331.59918, 330.88272, 332.43796]
2026-01-25 19:20:27,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [128.0, 129.0, 125.0, 128.0, 128.0, 84.0, 137.0, 129.0, 129.0, 129.0]
2026-01-25 19:20:27,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 11 seconds)
2026-01-25 19:21:49,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:21:50,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 320.14383 ± 7.298
2026-01-25 19:21:50,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [332.13257, 324.92188, 311.5082, 309.8874, 316.4891, 315.37726, 323.585, 313.63052, 325.7005, 328.20612]
2026-01-25 19:21:50,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [130.0, 128.0, 125.0, 125.0, 125.0, 126.0, 128.0, 126.0, 129.0, 128.0]
2026-01-25 19:21:50,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 2 minutes, 47 seconds)
2026-01-25 19:23:13,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:23:14,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 330.41058 ± 3.061
2026-01-25 19:23:14,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [334.53656, 333.96158, 331.41266, 327.38022, 325.392, 329.71176, 329.96967, 326.75952, 330.75293, 334.22873]
2026-01-25 19:23:14,197 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [129.0, 129.0, 128.0, 127.0, 126.0, 128.0, 128.0, 128.0, 129.0, 129.0]
2026-01-25 19:23:14,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 23 seconds)
2026-01-25 19:24:37,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-25 19:24:38,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1269 [DEBUG]: Total Reward: 337.08759 ± 5.210
2026-01-25 19:24:38,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1270 [DEBUG]: All rewards: [340.98926, 345.6197, 328.06345, 337.57916, 330.29504, 334.7568, 333.72897, 336.9212, 342.0281, 340.8944]
2026-01-25 19:24:38,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [133.0, 134.0, 131.0, 132.0, 131.0, 132.0, 132.0, 132.0, 134.0, 132.0]
2026-01-25 19:24:38,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1274 [INFO]: New best (337.09) for latency DatasetOffice
2026-01-25 19:24:38,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-hopper):1299 [DEBUG]: Training session finished
