2025-05-10 11:28:56,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 11:28:56,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16
2025-05-10 11:28:56,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7be3e4a3ff70>}
2025-05-10 11:28:56,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1111 [DEBUG]: using device: cpu
2025-05-10 11:28:56,863 baseline-sac-noisy-humanoid:77 [WARNING]: args.memorize_actions != args.horizon: 16 != 24
2025-05-10 11:28:56,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-10 11:28:56,886 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=648, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-10 11:28:56,886 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=665, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-10 11:28:57,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-10 11:28:57,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-10 11:32:30,155 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:32:31,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 309.35852 ± 20.400
2025-05-10 11:32:31,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [323.2189, 334.9338, 277.47586, 316.24533, 292.5793, 297.11255, 334.759, 299.26587, 285.83698, 332.1574]
2025-05-10 11:32:31,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 64.0, 52.0, 62.0, 57.0, 60.0, 65.0, 59.0, 58.0, 65.0]
2025-05-10 11:32:31,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (309.36) for latency MM1Queue_a033_s075
2025-05-10 11:32:31,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 11:32:31,633 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:32:31,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 53 minutes, 33 seconds)
2025-05-10 11:36:27,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:36:28,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 201.59689 ± 21.653
2025-05-10 11:36:28,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [227.38934, 183.70483, 221.85426, 188.96556, 206.03015, 185.99295, 182.71527, 197.80237, 176.61345, 244.90065]
2025-05-10 11:36:28,545 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 37.0, 44.0, 38.0, 41.0, 38.0, 37.0, 39.0, 36.0, 48.0]
2025-05-10 11:36:28,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 8 minutes, 27 seconds)
2025-05-10 11:40:22,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:40:22,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 147.45383 ± 7.472
2025-05-10 11:40:22,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [131.56204, 155.18079, 154.68777, 152.61324, 137.98334, 153.88354, 149.01723, 149.84001, 147.59947, 142.1708]
2025-05-10 11:40:22,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 32.0, 32.0, 32.0, 28.0, 32.0, 30.0, 31.0, 31.0, 29.0]
2025-05-10 11:40:22,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 9 minutes, 22 seconds)
2025-05-10 11:44:17,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:44:19,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 317.26639 ± 61.567
2025-05-10 11:44:19,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [307.41968, 354.98297, 300.01178, 358.7337, 376.08176, 386.2478, 386.41068, 219.65048, 227.7959, 255.32877]
2025-05-10 11:44:19,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 66.0, 56.0, 67.0, 69.0, 72.0, 71.0, 43.0, 46.0, 50.0]
2025-05-10 11:44:19,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (317.27) for latency MM1Queue_a033_s075
2025-05-10 11:44:19,325 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 11:44:19,329 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:44:19,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 8 minutes, 47 seconds)
2025-05-10 11:48:15,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:48:17,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 367.93954 ± 95.075
2025-05-10 11:48:17,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [396.29343, 531.58167, 443.9864, 341.94263, 361.32635, 294.06848, 358.99402, 240.41716, 224.63916, 486.14597]
2025-05-10 11:48:17,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 101.0, 85.0, 66.0, 67.0, 59.0, 70.0, 48.0, 47.0, 96.0]
2025-05-10 11:48:17,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (367.94) for latency MM1Queue_a033_s075
2025-05-10 11:48:17,717 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 11:48:17,721 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:48:17,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 7 minutes, 26 seconds)
2025-05-10 11:52:12,730 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:52:14,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 278.54788 ± 138.394
2025-05-10 11:52:14,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [129.108, 188.0998, 180.39885, 180.83359, 365.10596, 315.9914, 377.56754, 252.90482, 614.6076, 180.8613]
2025-05-10 11:52:14,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 38.0, 37.0, 37.0, 70.0, 63.0, 72.0, 52.0, 117.0, 36.0]
2025-05-10 11:52:14,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 10 minutes, 28 seconds)
2025-05-10 11:56:12,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 11:56:14,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 381.52975 ± 36.949
2025-05-10 11:56:14,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [358.60516, 433.69305, 326.8865, 385.4987, 391.8443, 332.84442, 440.48697, 412.8906, 365.21942, 367.32855]
2025-05-10 11:56:14,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 83.0, 62.0, 73.0, 73.0, 64.0, 83.0, 77.0, 70.0, 68.0]
2025-05-10 11:56:14,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (381.53) for latency MM1Queue_a033_s075
2025-05-10 11:56:14,338 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 11:56:14,342 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 11:56:14,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 7 minutes, 35 seconds)
2025-05-10 12:00:10,714 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:00:12,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 344.11652 ± 61.842
2025-05-10 12:00:12,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [287.44455, 407.9335, 334.3019, 330.99396, 322.92023, 324.17334, 212.83919, 384.4658, 412.97437, 423.11838]
2025-05-10 12:00:12,257 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 78.0, 63.0, 64.0, 62.0, 61.0, 42.0, 71.0, 78.0, 79.0]
2025-05-10 12:00:12,259 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 4 minutes, 45 seconds)
2025-05-10 12:04:13,927 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:04:15,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 272.29346 ± 137.344
2025-05-10 12:04:15,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [432.30566, 151.42894, 329.8909, 138.76445, 133.08298, 138.4721, 524.5557, 161.78099, 358.94232, 353.71033]
2025-05-10 12:04:15,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 30.0, 66.0, 27.0, 26.0, 27.0, 98.0, 33.0, 79.0, 74.0]
2025-05-10 12:04:15,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 2 minutes, 45 seconds)
2025-05-10 12:08:15,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:08:17,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 432.96576 ± 52.379
2025-05-10 12:08:17,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [366.80786, 419.09235, 399.41705, 527.571, 444.68634, 450.63248, 485.38367, 361.29706, 484.3854, 390.38452]
2025-05-10 12:08:17,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 79.0, 75.0, 99.0, 82.0, 84.0, 89.0, 66.0, 90.0, 72.0]
2025-05-10 12:08:17,627 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (432.97) for latency MM1Queue_a033_s075
2025-05-10 12:08:17,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 12:08:17,632 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:08:17,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 59 minutes, 58 seconds)
2025-05-10 12:12:17,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:12:19,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 374.13370 ± 114.552
2025-05-10 12:12:19,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [393.33502, 402.88196, 415.519, 172.95702, 309.23563, 621.35376, 234.37115, 408.78622, 367.1669, 415.7303]
2025-05-10 12:12:19,373 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 78.0, 78.0, 35.0, 58.0, 119.0, 47.0, 76.0, 70.0, 78.0]
2025-05-10 12:12:19,375 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 57 minutes, 35 seconds)
2025-05-10 12:16:17,539 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:16:18,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 274.37918 ± 115.029
2025-05-10 12:16:18,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [168.04124, 321.4418, 271.50192, 158.66718, 314.24426, 404.18533, 118.71618, 263.39832, 515.62036, 207.97528]
2025-05-10 12:16:18,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 65.0, 57.0, 31.0, 63.0, 87.0, 23.0, 51.0, 98.0, 42.0]
2025-05-10 12:16:18,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 53 minutes, 19 seconds)
2025-05-10 12:20:18,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:20:20,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 462.91568 ± 117.940
2025-05-10 12:20:20,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [756.09875, 456.58588, 497.43912, 350.9424, 473.29077, 393.84137, 543.59546, 405.88702, 444.77045, 306.70538]
2025-05-10 12:20:20,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 90.0, 96.0, 78.0, 98.0, 87.0, 106.0, 78.0, 89.0, 65.0]
2025-05-10 12:20:20,625 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (462.92) for latency MM1Queue_a033_s075
2025-05-10 12:20:20,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 12:20:20,630 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:20:20,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 50 minutes, 25 seconds)
2025-05-10 12:24:21,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:24:22,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 330.74066 ± 47.791
2025-05-10 12:24:22,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [398.15973, 282.21918, 268.65137, 290.33948, 341.3099, 367.44205, 384.01343, 299.19943, 388.31033, 287.76163]
2025-05-10 12:24:22,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 55.0, 52.0, 55.0, 64.0, 69.0, 70.0, 56.0, 71.0, 55.0]
2025-05-10 12:24:22,810 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 46 minutes, 10 seconds)
2025-05-10 12:28:21,297 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:28:23,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 439.16644 ± 101.393
2025-05-10 12:28:23,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [249.28061, 556.3708, 329.9686, 491.33878, 419.07413, 456.70395, 502.11356, 317.80508, 514.6984, 554.3107]
2025-05-10 12:28:23,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 110.0, 61.0, 91.0, 79.0, 83.0, 94.0, 58.0, 97.0, 106.0]
2025-05-10 12:28:23,354 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 41 minutes, 37 seconds)
2025-05-10 12:32:21,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:32:22,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 362.35718 ± 33.433
2025-05-10 12:32:22,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [291.74716, 397.5114, 374.41296, 426.3638, 357.30435, 358.64404, 370.59164, 342.14047, 352.97183, 351.88397]
2025-05-10 12:32:22,983 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 73.0, 69.0, 78.0, 67.0, 66.0, 68.0, 63.0, 66.0, 65.0]
2025-05-10 12:32:22,986 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 37 minutes)
2025-05-10 12:36:22,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:36:24,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 325.97537 ± 98.936
2025-05-10 12:36:24,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [314.29736, 319.0097, 395.53214, 371.99643, 135.47139, 144.66922, 414.25772, 370.139, 426.26242, 368.11813]
2025-05-10 12:36:24,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 62.0, 72.0, 69.0, 26.0, 28.0, 77.0, 68.0, 80.0, 68.0]
2025-05-10 12:36:24,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 33 minutes, 28 seconds)
2025-05-10 12:40:22,008 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:40:23,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 356.98602 ± 43.947
2025-05-10 12:40:23,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [272.79913, 353.90015, 326.05838, 387.6704, 345.61548, 337.71945, 428.07178, 418.29135, 371.1398, 328.59427]
2025-05-10 12:40:23,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 66.0, 61.0, 71.0, 68.0, 68.0, 80.0, 77.0, 72.0, 61.0]
2025-05-10 12:40:23,657 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 28 minutes, 49 seconds)
2025-05-10 12:44:23,143 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:44:25,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 480.61261 ± 104.249
2025-05-10 12:44:25,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [392.4631, 701.0537, 403.85968, 414.1632, 506.19354, 394.11255, 531.313, 578.4425, 539.188, 345.33704]
2025-05-10 12:44:25,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 138.0, 75.0, 79.0, 95.0, 74.0, 102.0, 107.0, 101.0, 69.0]
2025-05-10 12:44:25,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (480.61) for latency MM1Queue_a033_s075
2025-05-10 12:44:25,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 12:44:25,417 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 12:44:25,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 24 minutes, 42 seconds)
2025-05-10 12:48:24,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:48:26,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 397.66684 ± 126.531
2025-05-10 12:48:26,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [172.02469, 357.90753, 429.0454, 483.70346, 458.68277, 508.3906, 474.76273, 523.30585, 418.32358, 150.52173]
2025-05-10 12:48:26,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 74.0, 80.0, 97.0, 86.0, 97.0, 99.0, 103.0, 88.0, 29.0]
2025-05-10 12:48:26,310 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 20 minutes, 47 seconds)
2025-05-10 12:52:26,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:52:27,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 359.45401 ± 97.351
2025-05-10 12:52:27,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [382.1145, 345.37793, 297.49817, 422.90067, 130.0799, 310.67664, 461.77875, 426.9217, 333.2559, 483.9359]
2025-05-10 12:52:27,820 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 75.0, 60.0, 80.0, 25.0, 67.0, 88.0, 79.0, 69.0, 99.0]
2025-05-10 12:52:27,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 17 minutes, 16 seconds)
2025-05-10 12:56:26,111 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 12:56:27,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 364.11276 ± 95.404
2025-05-10 12:56:27,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [346.4724, 438.2772, 266.17395, 365.69855, 388.1135, 468.61136, 421.4458, 352.03995, 457.9506, 136.34457]
2025-05-10 12:56:27,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 81.0, 51.0, 69.0, 71.0, 87.0, 79.0, 66.0, 86.0, 26.0]
2025-05-10 12:56:27,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 12 minutes, 56 seconds)
2025-05-10 13:00:26,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:00:28,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 308.16519 ± 130.022
2025-05-10 13:00:28,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [360.2662, 171.18706, 427.94858, 433.6185, 154.51752, 399.92728, 367.15747, 154.20494, 477.11096, 135.7134]
2025-05-10 13:00:28,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 33.0, 80.0, 80.0, 30.0, 77.0, 78.0, 30.0, 90.0, 26.0]
2025-05-10 13:00:28,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 9 minutes, 7 seconds)
2025-05-10 13:04:26,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:04:28,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 399.49139 ± 71.639
2025-05-10 13:04:28,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [285.6133, 427.87198, 429.10028, 449.02145, 523.98987, 424.82486, 266.27942, 382.16257, 404.7277, 401.32224]
2025-05-10 13:04:28,316 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 80.0, 81.0, 84.0, 102.0, 79.0, 54.0, 72.0, 76.0, 75.0]
2025-05-10 13:04:28,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 4 minutes, 43 seconds)
2025-05-10 13:08:28,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:08:30,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 437.23657 ± 116.386
2025-05-10 13:08:30,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [698.2128, 397.3413, 342.19794, 510.66876, 464.2416, 425.04166, 301.30954, 273.68448, 451.54813, 508.1192]
2025-05-10 13:08:30,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 86.0, 69.0, 110.0, 89.0, 81.0, 62.0, 56.0, 97.0, 111.0]
2025-05-10 13:08:30,287 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 59 seconds)
2025-05-10 13:12:30,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:12:32,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 409.55939 ± 115.255
2025-05-10 13:12:32,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [305.71854, 376.82465, 376.27246, 286.23642, 394.9226, 353.33243, 418.04666, 396.96924, 464.61777, 722.6531]
2025-05-10 13:12:32,825 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 70.0, 72.0, 55.0, 73.0, 77.0, 75.0, 72.0, 86.0, 131.0]
2025-05-10 13:12:32,829 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 57 minutes, 14 seconds)
2025-05-10 13:16:31,680 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:16:33,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 437.97330 ± 84.857
2025-05-10 13:16:33,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [318.73535, 479.31155, 296.39337, 455.91885, 486.8324, 460.61465, 613.4734, 449.71515, 420.31744, 398.42102]
2025-05-10 13:16:33,783 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 97.0, 58.0, 85.0, 89.0, 100.0, 115.0, 81.0, 86.0, 75.0]
2025-05-10 13:16:33,788 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 53 minutes, 27 seconds)
2025-05-10 13:20:32,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:20:34,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 408.41144 ± 34.537
2025-05-10 13:20:34,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [413.44534, 395.04373, 361.87283, 445.9151, 356.4921, 464.59024, 379.78366, 406.31937, 446.38614, 414.2659]
2025-05-10 13:20:34,501 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 74.0, 67.0, 81.0, 66.0, 84.0, 70.0, 75.0, 85.0, 77.0]
2025-05-10 13:20:34,506 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 49 minutes, 33 seconds)
2025-05-10 13:24:35,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:24:37,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 350.41467 ± 112.522
2025-05-10 13:24:37,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [130.71822, 389.31714, 381.7384, 517.3101, 160.48116, 419.36444, 403.09735, 404.62027, 321.84094, 375.6589]
2025-05-10 13:24:37,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 73.0, 70.0, 102.0, 31.0, 79.0, 77.0, 75.0, 67.0, 69.0]
2025-05-10 13:24:37,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 46 minutes, 5 seconds)
2025-05-10 13:28:35,860 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:28:38,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 467.74786 ± 102.301
2025-05-10 13:28:38,069 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [333.1207, 443.42743, 469.2682, 711.4321, 402.06494, 388.9269, 439.4265, 538.27496, 546.9139, 404.6226]
2025-05-10 13:28:38,070 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 83.0, 88.0, 132.0, 74.0, 72.0, 82.0, 102.0, 101.0, 74.0]
2025-05-10 13:28:38,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 41 minutes, 49 seconds)
2025-05-10 13:32:38,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:32:40,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 420.26166 ± 52.688
2025-05-10 13:32:40,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [327.06293, 474.00613, 364.7068, 469.3122, 448.01932, 426.25717, 365.68988, 384.45645, 485.07468, 458.03098]
2025-05-10 13:32:40,132 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 86.0, 67.0, 94.0, 82.0, 78.0, 67.0, 70.0, 89.0, 84.0]
2025-05-10 13:32:40,137 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 37 minutes, 40 seconds)
2025-05-10 13:36:41,203 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:36:43,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 428.97299 ± 72.248
2025-05-10 13:36:43,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [347.49963, 438.4353, 442.74805, 406.78653, 412.02945, 406.17453, 356.02023, 445.5013, 410.5093, 624.0255]
2025-05-10 13:36:43,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 82.0, 81.0, 76.0, 75.0, 74.0, 66.0, 80.0, 75.0, 123.0]
2025-05-10 13:36:43,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 34 minutes, 7 seconds)
2025-05-10 13:40:41,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:40:43,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 395.79190 ± 63.092
2025-05-10 13:40:43,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [355.79678, 408.46725, 432.78616, 354.75958, 356.41092, 561.0858, 352.05188, 381.86432, 418.17377, 336.52237]
2025-05-10 13:40:43,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 74.0, 84.0, 65.0, 65.0, 105.0, 65.0, 72.0, 77.0, 62.0]
2025-05-10 13:40:43,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 29 minutes, 55 seconds)
2025-05-10 13:44:39,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:44:40,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 275.29633 ± 143.807
2025-05-10 13:44:40,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [135.44244, 151.01213, 155.03607, 513.6106, 351.0578, 286.22507, 166.87523, 310.2357, 532.62695, 150.84125]
2025-05-10 13:44:40,573 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 29.0, 30.0, 96.0, 66.0, 58.0, 32.0, 60.0, 97.0, 29.0]
2025-05-10 13:44:40,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 24 minutes, 45 seconds)
2025-05-10 13:48:39,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:48:41,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 439.51831 ± 133.115
2025-05-10 13:48:41,915 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [365.94385, 284.8678, 703.7058, 375.74844, 396.7828, 659.1373, 481.1893, 295.10468, 404.5315, 428.17178]
2025-05-10 13:48:41,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 54.0, 129.0, 71.0, 72.0, 121.0, 88.0, 55.0, 76.0, 77.0]
2025-05-10 13:48:41,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 20 minutes, 50 seconds)
2025-05-10 13:52:40,912 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:52:43,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 448.92001 ± 70.272
2025-05-10 13:52:43,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [462.92673, 538.20856, 368.8571, 364.1564, 453.1379, 388.42014, 465.044, 588.7562, 471.06403, 388.6289]
2025-05-10 13:52:43,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 107.0, 68.0, 69.0, 82.0, 77.0, 83.0, 118.0, 88.0, 70.0]
2025-05-10 13:52:43,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 16 minutes, 37 seconds)
2025-05-10 13:56:41,251 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 13:56:43,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 486.97552 ± 119.979
2025-05-10 13:56:43,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [381.5971, 431.07977, 492.79013, 356.7454, 607.21716, 718.20123, 428.66473, 416.30573, 387.40417, 649.7499]
2025-05-10 13:56:43,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 80.0, 90.0, 71.0, 119.0, 149.0, 78.0, 76.0, 80.0, 122.0]
2025-05-10 13:56:43,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (486.98) for latency MM1Queue_a033_s075
2025-05-10 13:56:43,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 13:56:43,638 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 13:56:43,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 12 minutes, 5 seconds)
2025-05-10 14:00:41,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:00:43,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 478.34528 ± 87.037
2025-05-10 14:00:43,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [439.96667, 601.7731, 488.1129, 451.1863, 388.55838, 424.53024, 486.64062, 354.93088, 656.65204, 491.10138]
2025-05-10 14:00:43,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 109.0, 89.0, 82.0, 72.0, 83.0, 88.0, 67.0, 137.0, 88.0]
2025-05-10 14:00:43,320 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 8 minutes, 2 seconds)
2025-05-10 14:04:43,536 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:04:45,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 490.20377 ± 92.547
2025-05-10 14:04:45,833 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [421.51276, 650.23676, 529.44257, 591.1099, 387.32355, 598.77606, 446.18124, 363.21854, 479.18655, 435.04965]
2025-05-10 14:04:45,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 121.0, 97.0, 106.0, 71.0, 121.0, 81.0, 66.0, 89.0, 79.0]
2025-05-10 14:04:45,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (490.20) for latency MM1Queue_a033_s075
2025-05-10 14:04:45,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 14:04:45,838 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 14:04:45,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 5 minutes, 4 seconds)
2025-05-10 14:08:42,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:08:44,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 438.08243 ± 69.786
2025-05-10 14:08:44,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [399.09958, 461.77527, 496.3011, 616.15607, 399.23145, 373.49588, 403.71426, 375.5082, 444.0478, 411.49463]
2025-05-10 14:08:44,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 84.0, 88.0, 125.0, 73.0, 70.0, 73.0, 68.0, 80.0, 75.0]
2025-05-10 14:08:44,299 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 28 seconds)
2025-05-10 14:12:43,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:12:46,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 499.96939 ± 72.629
2025-05-10 14:12:46,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [354.89, 521.2355, 511.06268, 605.55914, 595.53674, 562.73444, 453.6867, 458.07504, 445.86053, 491.05322]
2025-05-10 14:12:46,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 95.0, 93.0, 122.0, 109.0, 104.0, 82.0, 84.0, 82.0, 89.0]
2025-05-10 14:12:46,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (499.97) for latency MM1Queue_a033_s075
2025-05-10 14:12:46,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 14:12:46,132 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 14:12:46,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 56 minutes, 36 seconds)
2025-05-10 14:16:44,121 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:16:46,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 464.09833 ± 99.301
2025-05-10 14:16:46,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [338.16104, 423.504, 455.50037, 411.09583, 387.15253, 696.64886, 487.6113, 467.37616, 578.0865, 395.84674]
2025-05-10 14:16:46,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 77.0, 82.0, 74.0, 70.0, 142.0, 88.0, 87.0, 114.0, 79.0]
2025-05-10 14:16:46,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 52 minutes, 31 seconds)
2025-05-10 14:20:46,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:20:49,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 517.87549 ± 97.354
2025-05-10 14:20:49,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [414.91406, 513.3325, 622.743, 466.755, 703.20917, 371.81143, 466.35928, 455.25894, 585.0556, 579.31616]
2025-05-10 14:20:49,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 102.0, 126.0, 85.0, 142.0, 73.0, 86.0, 83.0, 106.0, 105.0]
2025-05-10 14:20:49,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (517.88) for latency MM1Queue_a033_s075
2025-05-10 14:20:49,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 14:20:49,110 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 14:20:49,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 49 minutes, 6 seconds)
2025-05-10 14:24:46,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:24:48,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 502.70898 ± 108.298
2025-05-10 14:24:48,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [398.75085, 520.6475, 430.86502, 627.74603, 313.90347, 494.01053, 527.13104, 700.556, 436.51193, 576.9673]
2025-05-10 14:24:48,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 104.0, 77.0, 126.0, 59.0, 91.0, 104.0, 127.0, 81.0, 110.0]
2025-05-10 14:24:48,510 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 44 minutes, 29 seconds)
2025-05-10 14:28:47,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:28:50,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 506.60504 ± 149.701
2025-05-10 14:28:50,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [376.67752, 554.22675, 857.03375, 376.295, 426.54233, 426.03766, 359.63666, 447.87076, 597.263, 644.467]
2025-05-10 14:28:50,258 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 100.0, 165.0, 68.0, 78.0, 79.0, 66.0, 81.0, 121.0, 118.0]
2025-05-10 14:28:50,266 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 41 minutes, 5 seconds)
2025-05-10 14:32:50,505 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:32:52,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 467.03012 ± 84.490
2025-05-10 14:32:52,702 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [423.80862, 486.69064, 383.9625, 366.2596, 443.79678, 681.505, 489.58148, 514.5369, 414.3966, 465.7631]
2025-05-10 14:32:52,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 90.0, 71.0, 69.0, 81.0, 128.0, 90.0, 94.0, 78.0, 87.0]
2025-05-10 14:32:52,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 37 minutes, 10 seconds)
2025-05-10 14:36:49,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:36:51,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 476.53598 ± 139.692
2025-05-10 14:36:51,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [387.10745, 419.2677, 494.371, 290.17172, 430.34525, 852.69214, 485.56586, 457.4718, 425.55325, 522.8141]
2025-05-10 14:36:51,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 77.0, 89.0, 56.0, 79.0, 162.0, 88.0, 83.0, 79.0, 94.0]
2025-05-10 14:36:51,626 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 32 minutes, 56 seconds)
2025-05-10 14:40:51,442 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:40:53,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 516.67090 ± 134.615
2025-05-10 14:40:53,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [402.66898, 738.5636, 369.8874, 538.20325, 441.7677, 553.1036, 413.95102, 367.23694, 740.3856, 600.9409]
2025-05-10 14:40:53,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 141.0, 67.0, 97.0, 80.0, 102.0, 76.0, 68.0, 135.0, 108.0]
2025-05-10 14:40:53,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 28 minutes, 49 seconds)
2025-05-10 14:44:53,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:44:56,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 544.49524 ± 125.636
2025-05-10 14:44:56,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [389.12048, 705.86755, 436.09976, 431.23914, 517.07056, 523.83636, 541.1839, 480.95456, 602.0591, 817.52106]
2025-05-10 14:44:56,368 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 138.0, 86.0, 81.0, 95.0, 111.0, 100.0, 87.0, 121.0, 165.0]
2025-05-10 14:44:56,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (544.50) for latency MM1Queue_a033_s075
2025-05-10 14:44:56,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 14:44:56,373 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 14:44:56,388 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 25 minutes, 20 seconds)
2025-05-10 14:48:54,241 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:48:56,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 466.04449 ± 85.921
2025-05-10 14:48:56,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [392.61826, 497.60333, 453.76126, 384.36456, 580.66296, 647.1939, 473.43832, 369.55566, 393.28375, 467.96262]
2025-05-10 14:48:56,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 89.0, 82.0, 81.0, 106.0, 133.0, 86.0, 66.0, 75.0, 85.0]
2025-05-10 14:48:56,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 21 minutes, 1 second)
2025-05-10 14:52:56,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:52:58,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 481.54865 ± 71.896
2025-05-10 14:52:58,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [404.63696, 442.42123, 583.4153, 488.60202, 546.0214, 582.80414, 395.77185, 392.82367, 534.5081, 444.4812]
2025-05-10 14:52:58,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 88.0, 104.0, 88.0, 102.0, 119.0, 73.0, 74.0, 97.0, 80.0]
2025-05-10 14:52:58,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 16 minutes, 54 seconds)
2025-05-10 14:56:57,705 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 14:57:00,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 480.92862 ± 129.475
2025-05-10 14:57:00,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [430.746, 391.50916, 510.0813, 473.5529, 428.70993, 412.674, 491.21567, 376.33804, 444.1578, 850.3013]
2025-05-10 14:57:00,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 78.0, 90.0, 85.0, 79.0, 87.0, 89.0, 70.0, 81.0, 174.0]
2025-05-10 14:57:00,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 13 minutes, 20 seconds)
2025-05-10 15:00:59,800 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:01:01,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 474.09424 ± 40.923
2025-05-10 15:01:01,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [432.55148, 499.5102, 442.89026, 462.15344, 428.53485, 502.44904, 567.85284, 469.7099, 440.38403, 494.90637]
2025-05-10 15:01:01,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 90.0, 81.0, 83.0, 78.0, 102.0, 102.0, 84.0, 83.0, 90.0]
2025-05-10 15:01:01,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 9 minutes, 16 seconds)
2025-05-10 15:05:00,652 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:05:02,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 442.76251 ± 97.563
2025-05-10 15:05:02,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [340.17957, 384.8602, 522.6516, 342.49658, 424.43277, 668.3537, 379.99548, 383.82864, 519.4989, 461.3276]
2025-05-10 15:05:02,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 79.0, 93.0, 64.0, 76.0, 130.0, 71.0, 71.0, 97.0, 88.0]
2025-05-10 15:05:02,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 4 minutes, 58 seconds)
2025-05-10 15:09:01,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:09:03,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 444.59784 ± 91.755
2025-05-10 15:09:03,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [453.4986, 542.7526, 440.5609, 275.81085, 411.4879, 464.465, 455.5411, 356.11792, 633.0501, 412.6938]
2025-05-10 15:09:03,459 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 100.0, 80.0, 54.0, 77.0, 86.0, 83.0, 66.0, 124.0, 75.0]
2025-05-10 15:09:03,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 1 minute, 3 seconds)
2025-05-10 15:13:00,879 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:13:02,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 454.65714 ± 83.027
2025-05-10 15:13:02,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [384.5388, 488.09482, 427.08026, 453.57556, 505.1746, 655.13654, 453.0205, 392.8193, 457.21597, 329.9152]
2025-05-10 15:13:02,973 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 89.0, 76.0, 83.0, 91.0, 119.0, 82.0, 71.0, 85.0, 62.0]
2025-05-10 15:13:02,982 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 56 minutes, 41 seconds)
2025-05-10 15:17:03,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:17:05,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 447.64420 ± 45.430
2025-05-10 15:17:05,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [435.2245, 405.08734, 445.89542, 416.0664, 541.97205, 497.15704, 454.1609, 384.03693, 414.99503, 481.8462]
2025-05-10 15:17:05,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 76.0, 80.0, 78.0, 101.0, 94.0, 82.0, 69.0, 77.0, 86.0]
2025-05-10 15:17:05,621 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-05-10 15:21:04,272 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:21:06,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 460.23956 ± 102.338
2025-05-10 15:21:06,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [395.93112, 386.63065, 316.08862, 341.35184, 527.3222, 428.2798, 527.2759, 443.0755, 622.575, 613.86475]
2025-05-10 15:21:06,480 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 73.0, 60.0, 65.0, 95.0, 88.0, 98.0, 87.0, 128.0, 111.0]
2025-05-10 15:21:06,489 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 48 minutes, 37 seconds)
2025-05-10 15:25:04,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:25:07,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 478.91266 ± 28.779
2025-05-10 15:25:07,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [450.47763, 471.34134, 471.44064, 504.8187, 536.4614, 484.58334, 467.38687, 474.23004, 427.06348, 501.32257]
2025-05-10 15:25:07,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 86.0, 89.0, 91.0, 97.0, 88.0, 86.0, 87.0, 88.0, 94.0]
2025-05-10 15:25:07,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 44 minutes, 35 seconds)
2025-05-10 15:29:05,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:29:07,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 429.87100 ± 56.149
2025-05-10 15:29:07,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [391.74545, 477.14725, 335.92975, 508.87222, 489.15765, 428.93176, 382.3083, 480.00034, 440.79492, 363.82236]
2025-05-10 15:29:07,097 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 87.0, 62.0, 92.0, 88.0, 81.0, 70.0, 86.0, 83.0, 67.0]
2025-05-10 15:29:07,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 40 minutes, 29 seconds)
2025-05-10 15:33:06,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:33:08,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 513.43250 ± 221.459
2025-05-10 15:33:08,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [465.863, 1101.9374, 441.23065, 277.12933, 431.66595, 356.25623, 475.8542, 425.39795, 705.0093, 453.98163]
2025-05-10 15:33:08,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 223.0, 82.0, 53.0, 78.0, 67.0, 87.0, 76.0, 129.0, 83.0]
2025-05-10 15:33:08,777 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 36 minutes, 45 seconds)
2025-05-10 15:37:05,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:37:07,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 483.35687 ± 38.575
2025-05-10 15:37:07,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [454.53256, 478.30664, 447.3107, 453.332, 431.0615, 546.2392, 509.45865, 468.15555, 546.5047, 498.66742]
2025-05-10 15:37:07,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 87.0, 81.0, 81.0, 81.0, 110.0, 92.0, 89.0, 100.0, 92.0]
2025-05-10 15:37:07,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 32 minutes, 16 seconds)
2025-05-10 15:41:07,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:41:10,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 503.25293 ± 69.354
2025-05-10 15:41:10,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [445.27734, 538.7807, 488.14185, 562.4479, 570.8909, 598.70404, 527.1326, 378.2369, 517.8765, 405.04034]
2025-05-10 15:41:10,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 97.0, 88.0, 106.0, 103.0, 113.0, 94.0, 70.0, 102.0, 73.0]
2025-05-10 15:41:10,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 28 minutes, 28 seconds)
2025-05-10 15:45:08,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:45:11,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 521.91418 ± 204.286
2025-05-10 15:45:11,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [393.22437, 489.03073, 436.4908, 468.644, 430.72437, 636.3156, 1099.353, 366.44666, 453.9217, 444.99017]
2025-05-10 15:45:11,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 91.0, 80.0, 85.0, 78.0, 117.0, 230.0, 66.0, 86.0, 81.0]
2025-05-10 15:45:11,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 24 minutes, 29 seconds)
2025-05-10 15:49:10,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:49:12,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 456.15460 ± 62.775
2025-05-10 15:49:12,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [389.14328, 611.4884, 488.6878, 465.85126, 394.98242, 473.065, 429.401, 468.2957, 453.2927, 387.33853]
2025-05-10 15:49:12,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 119.0, 94.0, 85.0, 71.0, 86.0, 78.0, 85.0, 84.0, 71.0]
2025-05-10 15:49:13,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 20 minutes, 41 seconds)
2025-05-10 15:53:09,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:53:11,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 453.94098 ± 80.690
2025-05-10 15:53:11,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [351.66458, 657.8318, 441.21243, 412.19162, 487.994, 393.27127, 434.65677, 398.89676, 508.06573, 453.6246]
2025-05-10 15:53:11,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 121.0, 82.0, 75.0, 89.0, 80.0, 79.0, 72.0, 96.0, 82.0]
2025-05-10 15:53:11,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 16 minutes, 21 seconds)
2025-05-10 15:57:12,618 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 15:57:14,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 475.60327 ± 75.014
2025-05-10 15:57:14,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [388.61392, 525.5523, 373.89612, 527.53827, 399.39078, 640.0572, 469.42517, 481.9136, 485.67264, 463.9725]
2025-05-10 15:57:14,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 95.0, 68.0, 95.0, 72.0, 118.0, 84.0, 86.0, 91.0, 85.0]
2025-05-10 15:57:14,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 12 minutes, 46 seconds)
2025-05-10 16:01:12,309 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:01:14,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 441.55133 ± 43.974
2025-05-10 16:01:14,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [372.60535, 483.76447, 417.11905, 428.5083, 446.8651, 499.33127, 453.13477, 386.28528, 511.25745, 416.64215]
2025-05-10 16:01:14,355 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 89.0, 75.0, 80.0, 82.0, 94.0, 83.0, 69.0, 94.0, 77.0]
2025-05-10 16:01:14,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 8 minutes, 25 seconds)
2025-05-10 16:05:13,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:05:15,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 472.34610 ± 95.652
2025-05-10 16:05:15,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [406.42014, 419.48654, 437.07858, 478.44962, 419.53848, 428.1412, 385.99164, 449.0309, 710.6817, 588.64185]
2025-05-10 16:05:15,863 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 77.0, 79.0, 87.0, 76.0, 78.0, 71.0, 83.0, 132.0, 106.0]
2025-05-10 16:05:15,875 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 4 minutes, 29 seconds)
2025-05-10 16:09:15,837 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:09:18,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 478.65250 ± 68.911
2025-05-10 16:09:18,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [426.35983, 466.1302, 433.459, 627.9061, 388.70795, 442.64496, 582.42554, 474.84192, 468.88705, 475.16202]
2025-05-10 16:09:18,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 84.0, 78.0, 114.0, 71.0, 81.0, 117.0, 84.0, 102.0, 86.0]
2025-05-10 16:09:18,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 30 seconds)
2025-05-10 16:13:17,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:13:19,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 494.63632 ± 94.041
2025-05-10 16:13:19,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [460.89987, 443.46494, 609.23645, 440.59354, 434.0537, 402.85345, 610.985, 439.73315, 681.37933, 423.16425]
2025-05-10 16:13:19,830 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 80.0, 110.0, 79.0, 79.0, 73.0, 118.0, 78.0, 126.0, 76.0]
2025-05-10 16:13:19,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 56 minutes, 45 seconds)
2025-05-10 16:17:17,763 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:17:20,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 488.31976 ± 143.335
2025-05-10 16:17:20,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [465.86536, 432.9983, 156.83669, 510.63177, 560.2257, 507.50266, 512.7213, 550.641, 765.5404, 420.23416]
2025-05-10 16:17:20,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 78.0, 30.0, 103.0, 100.0, 94.0, 91.0, 98.0, 155.0, 78.0]
2025-05-10 16:17:20,102 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 52 minutes, 29 seconds)
2025-05-10 16:21:21,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:21:23,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 527.45642 ± 67.652
2025-05-10 16:21:23,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [413.99097, 528.53485, 470.2401, 494.3684, 662.64685, 573.80133, 480.3781, 560.5929, 499.3473, 590.663]
2025-05-10 16:21:23,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 104.0, 84.0, 89.0, 120.0, 109.0, 86.0, 100.0, 95.0, 116.0]
2025-05-10 16:21:23,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 48 minutes, 49 seconds)
2025-05-10 16:25:24,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:25:26,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 495.53754 ± 103.450
2025-05-10 16:25:26,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [306.9933, 611.85834, 604.5692, 473.00827, 418.54468, 471.81155, 501.9196, 434.59558, 678.6394, 453.43536]
2025-05-10 16:25:26,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 112.0, 108.0, 85.0, 76.0, 90.0, 94.0, 78.0, 132.0, 84.0]
2025-05-10 16:25:26,644 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 44 minutes, 55 seconds)
2025-05-10 16:29:24,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:29:26,600 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 470.30893 ± 41.954
2025-05-10 16:29:26,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [372.80063, 533.7631, 438.6441, 457.9286, 489.50766, 493.58298, 478.39917, 461.82443, 512.96515, 463.6733]
2025-05-10 16:29:26,601 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 97.0, 79.0, 83.0, 89.0, 89.0, 86.0, 83.0, 95.0, 83.0]
2025-05-10 16:29:26,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 40 minutes, 42 seconds)
2025-05-10 16:33:28,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:33:30,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 460.83154 ± 76.726
2025-05-10 16:33:30,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [414.4497, 449.35153, 571.6189, 424.1625, 438.6264, 445.64874, 598.45386, 346.74255, 387.92096, 531.3402]
2025-05-10 16:33:30,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 82.0, 101.0, 80.0, 79.0, 82.0, 108.0, 65.0, 71.0, 96.0]
2025-05-10 16:33:30,521 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 36 minutes, 51 seconds)
2025-05-10 16:37:29,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:37:32,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 474.79901 ± 90.696
2025-05-10 16:37:32,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [459.94177, 508.4699, 521.1006, 367.93835, 441.9904, 637.4544, 332.17877, 458.47202, 418.86874, 601.57544]
2025-05-10 16:37:32,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 91.0, 93.0, 68.0, 80.0, 127.0, 62.0, 82.0, 77.0, 110.0]
2025-05-10 16:37:32,076 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 32 minutes, 55 seconds)
2025-05-10 16:41:32,322 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:41:34,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 494.78076 ± 172.734
2025-05-10 16:41:34,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [432.52216, 135.7792, 492.90707, 472.42184, 423.64722, 856.29114, 518.92255, 618.5515, 580.3348, 416.42984]
2025-05-10 16:41:34,718 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 26.0, 100.0, 85.0, 77.0, 172.0, 93.0, 123.0, 105.0, 76.0]
2025-05-10 16:41:34,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 28 minutes, 49 seconds)
2025-05-10 16:45:33,818 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:45:36,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 498.40884 ± 80.213
2025-05-10 16:45:36,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [428.4042, 482.9819, 534.49695, 552.8448, 463.186, 461.13553, 657.7948, 584.3201, 366.45593, 452.46796]
2025-05-10 16:45:36,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 87.0, 96.0, 104.0, 84.0, 84.0, 121.0, 104.0, 68.0, 81.0]
2025-05-10 16:45:36,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 24 minutes, 39 seconds)
2025-05-10 16:49:34,582 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:49:36,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 473.01596 ± 81.492
2025-05-10 16:49:36,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [554.5422, 600.94635, 454.07733, 565.7871, 375.83, 454.00748, 536.408, 415.07846, 352.37198, 421.11087]
2025-05-10 16:49:36,756 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 109.0, 83.0, 102.0, 69.0, 83.0, 97.0, 76.0, 66.0, 77.0]
2025-05-10 16:49:36,770 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 20 minutes, 40 seconds)
2025-05-10 16:53:39,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:53:41,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 500.26724 ± 108.101
2025-05-10 16:53:41,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [397.22614, 407.76685, 575.2075, 492.25433, 733.5603, 421.3382, 602.8892, 412.34192, 401.72345, 558.3642]
2025-05-10 16:53:41,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 75.0, 106.0, 88.0, 148.0, 78.0, 110.0, 77.0, 74.0, 101.0]
2025-05-10 16:53:41,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 16 minutes, 41 seconds)
2025-05-10 16:57:41,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 16:57:43,578 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 462.18701 ± 79.287
2025-05-10 16:57:43,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [438.90518, 537.21326, 435.53528, 658.672, 383.8743, 433.3894, 487.6856, 420.31244, 373.1463, 453.13617]
2025-05-10 16:57:43,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 99.0, 82.0, 122.0, 71.0, 81.0, 91.0, 76.0, 73.0, 83.0]
2025-05-10 16:57:43,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 12 minutes, 41 seconds)
2025-05-10 17:01:43,012 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:01:45,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 488.22437 ± 73.137
2025-05-10 17:01:45,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [426.60904, 588.71204, 482.2484, 462.99146, 650.96497, 431.7745, 414.4627, 498.9356, 497.42407, 428.1204]
2025-05-10 17:01:45,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 109.0, 87.0, 84.0, 118.0, 80.0, 76.0, 91.0, 93.0, 78.0]
2025-05-10 17:01:45,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 8 minutes, 35 seconds)
2025-05-10 17:05:47,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:05:49,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 466.76300 ± 77.517
2025-05-10 17:05:49,350 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [427.5199, 450.44873, 378.31882, 518.9859, 552.6256, 599.827, 423.23962, 453.66595, 526.636, 336.3621]
2025-05-10 17:05:49,351 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 82.0, 70.0, 96.0, 100.0, 110.0, 77.0, 82.0, 105.0, 63.0]
2025-05-10 17:05:49,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 4 minutes, 42 seconds)
2025-05-10 17:09:48,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:09:50,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 524.36731 ± 140.722
2025-05-10 17:09:50,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [496.40997, 656.9, 307.5822, 614.8158, 528.69464, 834.0849, 418.65637, 520.1941, 452.30637, 414.0295]
2025-05-10 17:09:50,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 121.0, 58.0, 111.0, 96.0, 155.0, 76.0, 93.0, 84.0, 76.0]
2025-05-10 17:09:50,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 42 seconds)
2025-05-10 17:13:50,733 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:13:53,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 500.28662 ± 65.218
2025-05-10 17:13:53,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [432.43912, 439.35553, 473.67194, 495.29404, 623.78094, 407.91806, 584.02563, 500.80692, 493.82645, 551.74774]
2025-05-10 17:13:53,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 79.0, 84.0, 89.0, 113.0, 76.0, 115.0, 92.0, 90.0, 99.0]
2025-05-10 17:13:53,060 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 56 minutes, 32 seconds)
2025-05-10 17:17:53,686 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:17:56,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 550.61560 ± 119.506
2025-05-10 17:17:56,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [467.1239, 600.8519, 363.2939, 585.87946, 679.5296, 808.1054, 488.64285, 527.1761, 531.42706, 454.1249]
2025-05-10 17:17:56,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 117.0, 68.0, 106.0, 125.0, 151.0, 87.0, 102.0, 100.0, 81.0]
2025-05-10 17:17:56,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (550.62) for latency MM1Queue_a033_s075
2025-05-10 17:17:56,332 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 17:17:56,336 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 17:17:56,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 52 minutes, 33 seconds)
2025-05-10 17:21:56,920 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:21:59,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 517.94788 ± 118.301
2025-05-10 17:21:59,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [524.945, 535.64575, 521.3981, 813.01764, 414.12125, 485.64532, 622.0762, 400.8596, 452.96515, 408.804]
2025-05-10 17:21:59,391 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 97.0, 94.0, 164.0, 75.0, 90.0, 123.0, 72.0, 83.0, 74.0]
2025-05-10 17:21:59,406 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 48 minutes, 33 seconds)
2025-05-10 17:25:59,890 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:26:02,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 457.14130 ± 78.650
2025-05-10 17:26:02,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [378.51825, 417.35217, 582.53815, 408.72113, 530.2212, 588.50934, 483.43796, 400.20026, 410.08374, 371.8311]
2025-05-10 17:26:02,031 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 76.0, 106.0, 75.0, 95.0, 118.0, 87.0, 75.0, 78.0, 69.0]
2025-05-10 17:26:02,045 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 44 minutes, 27 seconds)
2025-05-10 17:30:02,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:30:04,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 485.95239 ± 94.324
2025-05-10 17:30:04,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [416.0877, 466.58435, 471.40305, 497.42218, 663.76294, 301.54102, 561.7921, 406.38013, 523.2312, 551.3189]
2025-05-10 17:30:04,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 84.0, 84.0, 95.0, 122.0, 57.0, 101.0, 75.0, 98.0, 110.0]
2025-05-10 17:30:04,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 40 minutes, 27 seconds)
2025-05-10 17:34:04,711 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:34:07,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 514.59448 ± 100.058
2025-05-10 17:34:07,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [510.38733, 690.89923, 358.85123, 429.76004, 617.8112, 483.74554, 414.27515, 592.1286, 593.6963, 454.39032]
2025-05-10 17:34:07,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 130.0, 68.0, 77.0, 113.0, 87.0, 77.0, 107.0, 107.0, 81.0]
2025-05-10 17:34:07,108 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 36 minutes, 25 seconds)
2025-05-10 17:38:06,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:38:08,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 559.49200 ± 131.599
2025-05-10 17:38:08,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [814.4966, 527.4185, 447.31516, 412.9762, 701.6417, 525.05164, 556.27625, 353.82812, 627.2425, 628.67285]
2025-05-10 17:38:08,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 95.0, 81.0, 76.0, 131.0, 109.0, 99.0, 64.0, 115.0, 113.0]
2025-05-10 17:38:08,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (559.49) for latency MM1Queue_a033_s075
2025-05-10 17:38:08,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 17:38:08,785 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 17:38:08,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 32 minutes, 19 seconds)
2025-05-10 17:42:09,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:42:11,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 482.73468 ± 107.794
2025-05-10 17:42:11,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [549.7107, 478.47348, 456.07935, 402.66635, 422.10162, 755.5495, 551.6883, 405.53662, 364.27832, 441.26276]
2025-05-10 17:42:11,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 85.0, 84.0, 73.0, 77.0, 139.0, 100.0, 73.0, 67.0, 80.0]
2025-05-10 17:42:11,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 28 minutes, 16 seconds)
2025-05-10 17:46:11,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:46:14,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 528.64685 ± 132.833
2025-05-10 17:46:14,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [439.2034, 456.5109, 459.27747, 646.95306, 387.6668, 452.0917, 739.15576, 425.2812, 500.82913, 779.4986]
2025-05-10 17:46:14,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 82.0, 83.0, 132.0, 72.0, 82.0, 136.0, 83.0, 92.0, 152.0]
2025-05-10 17:46:14,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 14 seconds)
2025-05-10 17:50:14,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:50:16,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 479.36020 ± 77.253
2025-05-10 17:50:16,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [405.77533, 672.81976, 402.82596, 452.12503, 541.9083, 475.10596, 466.11707, 434.0589, 515.907, 426.9587]
2025-05-10 17:50:16,812 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 123.0, 73.0, 82.0, 97.0, 85.0, 84.0, 78.0, 93.0, 77.0]
2025-05-10 17:50:16,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 12 seconds)
2025-05-10 17:54:19,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:54:22,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 613.03247 ± 157.682
2025-05-10 17:54:22,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [474.95987, 483.2395, 530.2706, 837.00574, 926.7356, 538.66705, 667.12384, 417.0595, 556.86945, 698.3937]
2025-05-10 17:54:22,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 87.0, 95.0, 160.0, 174.0, 96.0, 121.0, 76.0, 108.0, 137.0]
2025-05-10 17:54:22,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (613.03) for latency MM1Queue_a033_s075
2025-05-10 17:54:22,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 17:54:22,134 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 17:54:22,157 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 12 seconds)
2025-05-10 17:58:24,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 17:58:27,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 491.75748 ± 79.649
2025-05-10 17:58:27,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [503.43436, 498.31638, 397.93918, 435.18463, 448.45187, 700.37177, 509.83112, 424.7811, 521.14264, 478.1218]
2025-05-10 17:58:27,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 96.0, 73.0, 79.0, 80.0, 128.0, 91.0, 78.0, 94.0, 85.0]
2025-05-10 17:58:27,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 11 seconds)
2025-05-10 18:02:27,293 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:02:29,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 477.33145 ± 58.735
2025-05-10 18:02:29,497 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [427.4568, 477.8081, 412.49274, 595.4204, 575.5914, 459.354, 434.33005, 433.3024, 472.03976, 485.519]
2025-05-10 18:02:29,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 86.0, 74.0, 107.0, 103.0, 82.0, 79.0, 81.0, 87.0, 90.0]
2025-05-10 18:02:29,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 7 seconds)
2025-05-10 18:06:29,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:06:31,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 481.38153 ± 100.689
2025-05-10 18:06:31,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [398.52863, 727.53, 399.7887, 480.9792, 469.3273, 561.5185, 389.6371, 532.5724, 470.23785, 383.6956]
2025-05-10 18:06:31,389 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 145.0, 73.0, 92.0, 85.0, 101.0, 73.0, 94.0, 88.0, 70.0]
2025-05-10 18:06:31,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 3 seconds)
2025-05-10 18:10:31,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 18:10:33,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 483.71005 ± 101.895
2025-05-10 18:10:33,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [356.9641, 438.47565, 724.4149, 391.42715, 438.94144, 491.82916, 560.91125, 561.36053, 455.3728, 417.40332]
2025-05-10 18:10:33,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 81.0, 134.0, 71.0, 78.0, 88.0, 99.0, 105.0, 85.0, 77.0]
2025-05-10 18:10:33,262 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1251 [DEBUG]: Training session finished
