2025-05-09 07:06:07,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac
2025-05-09 07:06:07,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac
2025-05-09 07:06:07,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7d581743ef70>}
2025-05-09 07:06:07,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1111 [DEBUG]: using device: cpu
2025-05-09 07:06:07,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-09 07:06:07,297 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=376, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-09 07:06:07,297 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 07:06:07,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-09 07:06:07,629 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-09 07:09:22,268 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:09:23,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 358.23288 ± 80.635
2025-05-09 07:09:23,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [411.60596, 404.02237, 424.96768, 431.35806, 396.00497, 319.73798, 144.81866, 322.8202, 379.47324, 347.51984]
2025-05-09 07:09:23,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 76.0, 88.0, 88.0, 73.0, 63.0, 30.0, 66.0, 85.0, 65.0]
2025-05-09 07:09:23,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (358.23) for latency MM1Queue_a033_s075
2025-05-09 07:09:23,808 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 07:09:23,811 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:09:23,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 23 minutes, 42 seconds)
2025-05-09 07:13:06,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:13:07,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 283.50800 ± 90.656
2025-05-09 07:13:07,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [278.59644, 243.42, 263.75867, 400.3183, 278.88138, 370.73706, 438.91318, 167.6857, 251.80351, 140.9658]
2025-05-09 07:13:07,162 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 47.0, 50.0, 82.0, 53.0, 71.0, 84.0, 33.0, 48.0, 30.0]
2025-05-09 07:13:07,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 42 minutes, 37 seconds)
2025-05-09 07:16:46,404 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:16:47,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 320.39716 ± 90.357
2025-05-09 07:16:47,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [276.04453, 418.77704, 451.6929, 160.34775, 378.3236, 299.88004, 383.25552, 307.56952, 183.7306, 344.35]
2025-05-09 07:16:47,689 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 78.0, 96.0, 33.0, 73.0, 55.0, 73.0, 59.0, 42.0, 63.0]
2025-05-09 07:16:47,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 44 minutes, 55 seconds)
2025-05-09 07:20:29,158 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:20:30,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 375.01578 ± 52.572
2025-05-09 07:20:30,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [372.88986, 387.05533, 476.18152, 410.02005, 331.49838, 365.7854, 318.33334, 396.0846, 280.09872, 412.21036]
2025-05-09 07:20:30,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 82.0, 95.0, 85.0, 77.0, 70.0, 60.0, 76.0, 56.0, 76.0]
2025-05-09 07:20:30,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (375.02) for latency MM1Queue_a033_s075
2025-05-09 07:20:30,703 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 07:20:30,707 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:20:30,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 45 minutes, 14 seconds)
2025-05-09 07:24:12,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:24:13,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 361.79318 ± 93.422
2025-05-09 07:24:13,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [446.91794, 289.65326, 392.9307, 435.2235, 339.9466, 202.59776, 422.57333, 214.95827, 492.2772, 380.85358]
2025-05-09 07:24:13,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 54.0, 81.0, 83.0, 63.0, 39.0, 79.0, 41.0, 106.0, 71.0]
2025-05-09 07:24:13,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 43 minutes, 50 seconds)
2025-05-09 07:27:54,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:27:55,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 356.63104 ± 67.173
2025-05-09 07:27:55,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [347.4972, 321.55054, 419.22333, 309.79297, 474.01492, 296.59354, 445.19952, 362.96393, 343.55896, 245.91557]
2025-05-09 07:27:55,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 60.0, 76.0, 59.0, 91.0, 60.0, 86.0, 70.0, 65.0, 48.0]
2025-05-09 07:27:55,836 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 48 minutes, 25 seconds)
2025-05-09 07:31:36,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:31:37,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 339.28659 ± 93.762
2025-05-09 07:31:37,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [327.0847, 350.38196, 200.73317, 229.87961, 334.6147, 565.7176, 346.1515, 388.8389, 291.15604, 358.3079]
2025-05-09 07:31:37,394 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 66.0, 38.0, 45.0, 75.0, 106.0, 65.0, 79.0, 56.0, 67.0]
2025-05-09 07:31:37,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 44 minutes, 10 seconds)
2025-05-09 07:35:20,634 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:35:22,146 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 366.96561 ± 89.688
2025-05-09 07:35:22,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [267.23413, 328.06378, 344.66736, 497.36703, 307.81305, 442.94098, 212.2431, 377.4271, 392.6538, 499.24588]
2025-05-09 07:35:22,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 62.0, 64.0, 94.0, 63.0, 97.0, 40.0, 69.0, 90.0, 93.0]
2025-05-09 07:35:22,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 41 minutes, 46 seconds)
2025-05-09 07:39:07,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:39:08,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 315.80722 ± 97.003
2025-05-09 07:39:08,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [278.98904, 380.11923, 339.47937, 227.84703, 153.20927, 249.0446, 375.07635, 526.2268, 285.6137, 342.46683]
2025-05-09 07:39:08,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 83.0, 65.0, 47.0, 35.0, 51.0, 80.0, 113.0, 59.0, 65.0]
2025-05-09 07:39:08,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 39 minutes, 11 seconds)
2025-05-09 07:42:53,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:42:55,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 331.56683 ± 95.370
2025-05-09 07:42:55,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [356.6067, 414.50183, 315.59247, 314.0003, 292.37125, 109.56855, 378.96664, 254.44283, 454.60965, 425.00815]
2025-05-09 07:42:55,075 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 93.0, 71.0, 68.0, 59.0, 23.0, 70.0, 52.0, 86.0, 78.0]
2025-05-09 07:42:55,078 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 36 minutes, 29 seconds)
2025-05-09 07:46:41,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:46:43,548 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 394.32343 ± 93.206
2025-05-09 07:46:43,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [380.1325, 364.5373, 329.63434, 629.9651, 363.74887, 339.30807, 280.3022, 431.96274, 471.32483, 352.31833]
2025-05-09 07:46:43,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 67.0, 63.0, 135.0, 82.0, 78.0, 53.0, 79.0, 103.0, 64.0]
2025-05-09 07:46:43,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (394.32) for latency MM1Queue_a033_s075
2025-05-09 07:46:43,549 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 07:46:43,552 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:46:43,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 34 minutes, 33 seconds)
2025-05-09 07:50:31,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:50:33,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 429.67169 ± 97.468
2025-05-09 07:50:33,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [592.89636, 274.9768, 459.59805, 349.01547, 574.611, 465.84366, 309.98605, 413.65253, 420.17743, 435.95944]
2025-05-09 07:50:33,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 54.0, 95.0, 66.0, 111.0, 91.0, 68.0, 78.0, 93.0, 96.0]
2025-05-09 07:50:33,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (429.67) for latency MM1Queue_a033_s075
2025-05-09 07:50:33,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 07:50:33,305 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 07:50:33,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 33 minutes, 12 seconds)
2025-05-09 07:54:19,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:54:20,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 410.66397 ± 71.242
2025-05-09 07:54:20,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [363.79138, 362.38126, 379.53372, 530.47375, 368.78296, 448.32468, 430.99936, 278.58224, 437.42465, 506.3458]
2025-05-09 07:54:20,739 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 66.0, 78.0, 98.0, 81.0, 84.0, 81.0, 52.0, 80.0, 91.0]
2025-05-09 07:54:20,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 30 minutes, 11 seconds)
2025-05-09 07:58:13,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 07:58:14,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 394.67731 ± 36.038
2025-05-09 07:58:14,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [370.9124, 414.78574, 389.0408, 385.83267, 306.7719, 441.08298, 404.37744, 385.30127, 417.75916, 430.90863]
2025-05-09 07:58:14,901 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 77.0, 70.0, 78.0, 68.0, 81.0, 76.0, 72.0, 80.0, 81.0]
2025-05-09 07:58:14,904 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 28 minutes, 30 seconds)
2025-05-09 08:02:05,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:02:08,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 493.21112 ± 64.047
2025-05-09 08:02:08,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [457.86368, 453.47745, 394.39478, 617.5494, 571.8505, 486.09134, 530.16534, 522.0042, 430.75967, 467.9549]
2025-05-09 08:02:08,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 92.0, 71.0, 119.0, 106.0, 93.0, 104.0, 101.0, 88.0, 87.0]
2025-05-09 08:02:08,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (493.21) for latency MM1Queue_a033_s075
2025-05-09 08:02:08,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 08:02:08,043 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 08:02:08,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 26 minutes, 40 seconds)
2025-05-09 08:06:01,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:06:03,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 385.24994 ± 57.160
2025-05-09 08:06:03,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [441.41797, 456.34772, 430.1824, 411.12637, 376.98993, 382.46, 368.1016, 341.66122, 246.1767, 398.0356]
2025-05-09 08:06:03,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 86.0, 93.0, 83.0, 68.0, 70.0, 68.0, 63.0, 47.0, 72.0]
2025-05-09 08:06:03,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 24 minutes, 39 seconds)
2025-05-09 08:09:54,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:09:56,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 423.07501 ± 35.692
2025-05-09 08:09:56,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [470.22662, 463.02026, 366.0737, 402.73395, 467.23813, 455.1631, 411.59192, 408.27545, 401.66562, 384.7617]
2025-05-09 08:09:56,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 83.0, 66.0, 73.0, 85.0, 85.0, 83.0, 77.0, 74.0, 75.0]
2025-05-09 08:09:56,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 21 minutes, 46 seconds)
2025-05-09 08:13:48,587 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:13:50,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 405.28003 ± 96.966
2025-05-09 08:13:50,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [363.5972, 360.04788, 372.40616, 529.1335, 347.17358, 254.99515, 420.32916, 407.4101, 376.14542, 621.56195]
2025-05-09 08:13:50,192 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 64.0, 68.0, 99.0, 62.0, 50.0, 75.0, 73.0, 70.0, 123.0]
2025-05-09 08:13:50,196 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 19 minutes, 39 seconds)
2025-05-09 08:17:43,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:17:45,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 443.21988 ± 88.563
2025-05-09 08:17:45,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [411.3675, 536.06415, 319.1091, 336.07925, 348.74023, 450.45663, 611.8352, 474.44272, 502.62967, 441.47427]
2025-05-09 08:17:45,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 95.0, 65.0, 62.0, 62.0, 81.0, 135.0, 92.0, 92.0, 85.0]
2025-05-09 08:17:45,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 16 minutes, 1 second)
2025-05-09 08:21:38,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:21:40,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 422.14032 ± 80.077
2025-05-09 08:21:40,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [379.72372, 481.72226, 426.96252, 426.69464, 468.88376, 269.4408, 409.54587, 593.57855, 398.60226, 366.24863]
2025-05-09 08:21:40,168 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 91.0, 78.0, 84.0, 103.0, 54.0, 74.0, 117.0, 74.0, 66.0]
2025-05-09 08:21:40,172 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 12 minutes, 33 seconds)
2025-05-09 08:25:31,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:25:32,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 402.00558 ± 108.599
2025-05-09 08:25:32,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [494.0515, 476.50275, 416.54135, 386.37476, 458.9461, 479.24738, 104.823524, 439.3018, 425.4473, 338.81943]
2025-05-09 08:25:32,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 87.0, 75.0, 69.0, 98.0, 87.0, 21.0, 79.0, 78.0, 63.0]
2025-05-09 08:25:32,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 7 minutes, 59 seconds)
2025-05-09 08:29:25,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:29:27,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 510.76025 ± 178.402
2025-05-09 08:29:27,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [267.3725, 643.6099, 450.75702, 454.0166, 393.52432, 943.7046, 472.28384, 615.0808, 501.14786, 366.10522]
2025-05-09 08:29:27,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 117.0, 82.0, 82.0, 72.0, 179.0, 86.0, 112.0, 93.0, 67.0]
2025-05-09 08:29:27,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (510.76) for latency MM1Queue_a033_s075
2025-05-09 08:29:27,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 08:29:27,132 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 08:29:27,141 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 4 minutes, 23 seconds)
2025-05-09 08:33:21,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:33:23,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 510.04303 ± 189.230
2025-05-09 08:33:23,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [557.8091, 434.90558, 510.7913, 268.7054, 551.51886, 327.77036, 499.94464, 585.08124, 372.00955, 991.894]
2025-05-09 08:33:23,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 77.0, 91.0, 51.0, 100.0, 60.0, 93.0, 106.0, 68.0, 192.0]
2025-05-09 08:33:23,862 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 1 minute, 14 seconds)
2025-05-09 08:37:16,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:37:18,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 479.41400 ± 80.002
2025-05-09 08:37:18,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [431.6392, 415.87268, 396.32654, 665.5273, 593.48865, 458.33383, 479.30554, 438.6094, 463.0043, 452.03268]
2025-05-09 08:37:18,540 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 76.0, 72.0, 142.0, 111.0, 89.0, 87.0, 82.0, 86.0, 82.0]
2025-05-09 08:37:18,544 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 57 minutes, 11 seconds)
2025-05-09 08:41:12,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:41:14,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 520.65802 ± 120.423
2025-05-09 08:41:14,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [674.6994, 662.9925, 371.69766, 478.35373, 663.7685, 533.0185, 585.0641, 517.09784, 377.24832, 342.6394]
2025-05-09 08:41:14,369 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 123.0, 67.0, 100.0, 123.0, 98.0, 107.0, 96.0, 68.0, 66.0]
2025-05-09 08:41:14,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (520.66) for latency MM1Queue_a033_s075
2025-05-09 08:41:14,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 08:41:14,373 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 08:41:14,382 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 53 minutes, 33 seconds)
2025-05-09 08:45:05,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:45:07,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 514.06866 ± 115.394
2025-05-09 08:45:07,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [526.7973, 513.29425, 552.81903, 471.92108, 792.9375, 363.04007, 459.10205, 419.1203, 613.05225, 428.60278]
2025-05-09 08:45:07,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 94.0, 103.0, 86.0, 152.0, 72.0, 82.0, 91.0, 124.0, 77.0]
2025-05-09 08:45:07,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 49 minutes, 42 seconds)
2025-05-09 08:49:01,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:49:03,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 457.19638 ± 93.465
2025-05-09 08:49:03,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [307.9262, 456.1284, 440.59454, 421.86676, 390.8091, 462.13766, 541.12054, 477.4162, 397.45578, 676.5088]
2025-05-09 08:49:03,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 93.0, 80.0, 78.0, 72.0, 86.0, 100.0, 89.0, 77.0, 137.0]
2025-05-09 08:49:03,340 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 46 minutes, 12 seconds)
2025-05-09 08:52:53,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:52:55,438 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 511.01895 ± 145.329
2025-05-09 08:52:55,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [423.5879, 663.9448, 575.16205, 497.349, 749.40594, 631.2, 333.592, 392.26343, 275.0206, 568.6643]
2025-05-09 08:52:55,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 125.0, 123.0, 100.0, 143.0, 132.0, 77.0, 75.0, 65.0, 119.0]
2025-05-09 08:52:55,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 41 minutes, 10 seconds)
2025-05-09 08:56:48,073 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 08:56:49,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 428.19507 ± 41.978
2025-05-09 08:56:49,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [481.809, 391.61038, 398.14502, 498.19736, 417.85187, 346.57635, 454.34637, 440.26703, 425.67453, 427.47287]
2025-05-09 08:56:49,750 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 72.0, 74.0, 97.0, 77.0, 64.0, 86.0, 80.0, 78.0, 79.0]
2025-05-09 08:56:49,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 37 minutes, 11 seconds)
2025-05-09 09:00:40,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:00:43,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 540.03448 ± 117.745
2025-05-09 09:00:43,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [508.93286, 519.2571, 508.37497, 472.39404, 352.6475, 415.65997, 584.36694, 599.8708, 644.6672, 794.1734]
2025-05-09 09:00:43,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 95.0, 92.0, 92.0, 66.0, 74.0, 107.0, 129.0, 127.0, 154.0]
2025-05-09 09:00:43,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (540.03) for latency MM1Queue_a033_s075
2025-05-09 09:00:43,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:00:43,067 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:00:43,077 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 32 minutes, 41 seconds)
2025-05-09 09:04:33,623 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:04:35,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 503.35410 ± 109.063
2025-05-09 09:04:35,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [431.1157, 506.60254, 703.3051, 627.44855, 428.03058, 445.76425, 595.58215, 505.79062, 488.0485, 301.85324]
2025-05-09 09:04:35,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 91.0, 139.0, 117.0, 78.0, 82.0, 110.0, 96.0, 104.0, 61.0]
2025-05-09 09:04:35,701 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 28 minutes, 46 seconds)
2025-05-09 09:08:26,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:08:28,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 513.68604 ± 148.143
2025-05-09 09:08:28,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [427.68643, 495.71204, 899.4061, 654.5048, 526.7736, 439.1113, 394.43994, 473.6804, 437.42456, 388.1211]
2025-05-09 09:08:28,554 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 92.0, 170.0, 122.0, 97.0, 81.0, 72.0, 95.0, 79.0, 70.0]
2025-05-09 09:08:28,559 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 24 minutes, 6 seconds)
2025-05-09 09:12:19,405 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:12:21,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 506.95654 ± 120.450
2025-05-09 09:12:21,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [344.40665, 420.69833, 573.6095, 347.30408, 616.30585, 747.60944, 548.74194, 548.55835, 417.88678, 504.4445]
2025-05-09 09:12:21,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 82.0, 111.0, 67.0, 118.0, 160.0, 103.0, 101.0, 74.0, 92.0]
2025-05-09 09:12:21,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 20 minutes, 25 seconds)
2025-05-09 09:16:11,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:16:13,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 459.66876 ± 87.842
2025-05-09 09:16:13,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [502.75754, 488.8217, 445.6429, 495.013, 264.0171, 503.43463, 458.7864, 397.905, 624.2753, 416.03427]
2025-05-09 09:16:13,189 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 91.0, 81.0, 94.0, 51.0, 93.0, 100.0, 77.0, 119.0, 85.0]
2025-05-09 09:16:13,195 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 15 minutes, 57 seconds)
2025-05-09 09:20:04,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:20:05,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 452.39981 ± 59.250
2025-05-09 09:20:05,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [490.1976, 400.55496, 564.6097, 437.79526, 511.69894, 410.56238, 503.58337, 441.32083, 371.40356, 392.27173]
2025-05-09 09:20:05,952 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 73.0, 118.0, 78.0, 96.0, 74.0, 95.0, 83.0, 68.0, 72.0]
2025-05-09 09:20:05,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 11 minutes, 57 seconds)
2025-05-09 09:23:56,870 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:23:59,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 530.86359 ± 164.897
2025-05-09 09:23:59,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [673.3721, 416.47702, 416.5189, 533.20245, 423.5312, 946.1761, 328.1369, 528.8893, 529.6408, 512.69116]
2025-05-09 09:23:59,134 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 76.0, 75.0, 103.0, 76.0, 178.0, 68.0, 112.0, 100.0, 93.0]
2025-05-09 09:23:59,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 8 minutes, 12 seconds)
2025-05-09 09:27:51,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:27:53,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 503.28824 ± 101.417
2025-05-09 09:27:53,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [490.48746, 592.35394, 385.609, 556.7914, 460.90448, 731.0702, 531.9127, 459.9596, 362.07462, 461.71893]
2025-05-09 09:27:53,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 111.0, 72.0, 106.0, 85.0, 141.0, 99.0, 83.0, 83.0, 85.0]
2025-05-09 09:27:53,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 4 minutes, 44 seconds)
2025-05-09 09:31:45,411 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:31:47,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 413.77448 ± 97.329
2025-05-09 09:31:47,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [315.89966, 465.22522, 509.28427, 482.35855, 185.71086, 427.32727, 372.39053, 411.58627, 434.45953, 533.50244]
2025-05-09 09:31:47,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 84.0, 94.0, 88.0, 41.0, 81.0, 66.0, 76.0, 78.0, 102.0]
2025-05-09 09:31:47,024 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 52 seconds)
2025-05-09 09:35:36,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:35:39,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 537.96301 ± 91.339
2025-05-09 09:35:39,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [302.24042, 611.689, 604.54486, 573.6317, 603.62366, 499.68344, 477.36823, 524.0437, 565.25214, 617.5528]
2025-05-09 09:35:39,230 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 129.0, 114.0, 125.0, 115.0, 91.0, 88.0, 98.0, 105.0, 120.0]
2025-05-09 09:35:39,236 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 57 minutes, 5 seconds)
2025-05-09 09:39:30,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:39:32,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 545.10968 ± 131.851
2025-05-09 09:39:32,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [619.5835, 500.93512, 717.6053, 514.0238, 334.22736, 620.7566, 787.75543, 452.98022, 468.7388, 434.49057]
2025-05-09 09:39:32,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 91.0, 133.0, 94.0, 61.0, 113.0, 149.0, 96.0, 86.0, 91.0]
2025-05-09 09:39:32,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (545.11) for latency MM1Queue_a033_s075
2025-05-09 09:39:32,446 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:39:32,449 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:39:32,463 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 53 minutes, 18 seconds)
2025-05-09 09:43:23,017 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:43:24,965 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 463.64844 ± 60.942
2025-05-09 09:43:24,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [519.14703, 468.8216, 439.41446, 348.19577, 399.22394, 515.20325, 426.24023, 568.6908, 492.80304, 458.7443]
2025-05-09 09:43:24,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 95.0, 97.0, 75.0, 87.0, 95.0, 78.0, 121.0, 94.0, 84.0]
2025-05-09 09:43:24,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 49 minutes, 16 seconds)
2025-05-09 09:47:17,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:47:19,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 497.88095 ± 134.518
2025-05-09 09:47:19,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [619.7868, 744.70154, 558.03894, 552.7269, 578.69965, 415.7355, 390.37396, 444.11407, 239.20415, 435.42838]
2025-05-09 09:47:19,413 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 143.0, 104.0, 103.0, 121.0, 89.0, 79.0, 80.0, 44.0, 92.0]
2025-05-09 09:47:19,419 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 45 minutes, 18 seconds)
2025-05-09 09:51:12,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:51:15,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 512.69611 ± 134.075
2025-05-09 09:51:15,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [420.94846, 328.63474, 446.50607, 619.0793, 532.7054, 564.49133, 809.237, 368.59567, 588.79553, 447.96786]
2025-05-09 09:51:15,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 59.0, 96.0, 137.0, 113.0, 114.0, 156.0, 67.0, 126.0, 84.0]
2025-05-09 09:51:15,202 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 41 minutes, 57 seconds)
2025-05-09 09:55:04,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:55:06,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 518.44080 ± 167.996
2025-05-09 09:55:06,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [656.63074, 645.91064, 224.73244, 653.4594, 458.50528, 467.4233, 737.64905, 529.464, 228.64673, 581.9873]
2025-05-09 09:55:06,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 119.0, 44.0, 121.0, 82.0, 83.0, 155.0, 98.0, 51.0, 109.0]
2025-05-09 09:55:06,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 37 minutes, 58 seconds)
2025-05-09 09:58:57,729 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:59:00,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 696.24231 ± 231.525
2025-05-09 09:59:00,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1026.6, 794.15, 630.0961, 566.7959, 560.0846, 1186.0637, 457.66257, 720.87964, 436.37027, 583.72064]
2025-05-09 09:59:00,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 152.0, 120.0, 109.0, 106.0, 231.0, 98.0, 143.0, 85.0, 106.0]
2025-05-09 09:59:00,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (696.24) for latency MM1Queue_a033_s075
2025-05-09 09:59:00,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:59:00,713 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:59:00,724 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 34 minutes, 10 seconds)
2025-05-09 10:02:52,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:02:54,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 550.67596 ± 187.976
2025-05-09 10:02:54,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [332.4811, 477.48718, 471.04315, 884.1106, 480.19202, 922.41736, 377.04163, 460.77444, 523.45245, 577.7597]
2025-05-09 10:02:54,938 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 101.0, 86.0, 174.0, 87.0, 179.0, 71.0, 98.0, 117.0, 111.0]
2025-05-09 10:02:54,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 30 minutes, 35 seconds)
2025-05-09 10:06:45,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:06:47,931 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 601.96729 ± 227.346
2025-05-09 10:06:47,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [805.6972, 584.1374, 432.87918, 409.61966, 1203.9044, 574.4409, 542.0421, 439.2385, 533.4064, 494.30737]
2025-05-09 10:06:47,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 108.0, 81.0, 73.0, 239.0, 104.0, 104.0, 79.0, 100.0, 95.0]
2025-05-09 10:06:47,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 26 minutes, 26 seconds)
2025-05-09 10:10:39,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:10:41,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 493.52286 ± 98.696
2025-05-09 10:10:41,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [432.06058, 438.26578, 763.4896, 485.27298, 457.72687, 433.50223, 476.09576, 447.91608, 572.091, 428.8074]
2025-05-09 10:10:41,910 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 78.0, 142.0, 90.0, 83.0, 79.0, 90.0, 81.0, 105.0, 78.0]
2025-05-09 10:10:41,917 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 22 minutes, 13 seconds)
2025-05-09 10:14:32,868 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:14:35,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 522.73572 ± 74.682
2025-05-09 10:14:35,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [496.85086, 476.007, 403.7835, 569.0275, 568.93195, 691.1236, 461.27435, 533.2165, 481.28955, 545.852]
2025-05-09 10:14:35,046 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 89.0, 74.0, 107.0, 104.0, 132.0, 99.0, 111.0, 88.0, 116.0]
2025-05-09 10:14:35,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 18 minutes, 34 seconds)
2025-05-09 10:18:26,307 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:18:28,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 569.48096 ± 147.665
2025-05-09 10:18:28,743 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [246.5504, 450.8235, 554.53937, 530.1585, 629.67267, 591.8223, 789.1472, 673.7347, 738.0244, 490.3363]
2025-05-09 10:18:28,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 96.0, 102.0, 118.0, 120.0, 113.0, 150.0, 126.0, 144.0, 92.0]
2025-05-09 10:18:28,751 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 14 minutes, 40 seconds)
2025-05-09 10:22:18,122 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:22:21,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 675.18182 ± 173.500
2025-05-09 10:22:21,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [483.34787, 605.37506, 655.38336, 609.54767, 699.00653, 1085.2983, 634.1277, 555.8962, 528.51447, 895.32153]
2025-05-09 10:22:21,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 115.0, 139.0, 117.0, 150.0, 215.0, 119.0, 105.0, 101.0, 167.0]
2025-05-09 10:22:21,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 10 minutes, 28 seconds)
2025-05-09 10:26:15,263 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:26:18,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 712.62842 ± 243.748
2025-05-09 10:26:18,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [601.09576, 547.1935, 1269.6252, 795.0865, 743.9145, 678.8477, 917.8498, 297.7038, 706.7075, 568.25995]
2025-05-09 10:26:18,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 113.0, 241.0, 165.0, 155.0, 129.0, 186.0, 55.0, 134.0, 103.0]
2025-05-09 10:26:18,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (712.63) for latency MM1Queue_a033_s075
2025-05-09 10:26:18,449 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:26:18,453 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:26:18,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 7 minutes, 17 seconds)
2025-05-09 10:30:09,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:30:12,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 652.67688 ± 126.575
2025-05-09 10:30:12,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [496.3442, 962.67847, 606.9874, 708.8041, 719.5164, 589.5735, 641.96625, 706.8171, 529.42786, 564.6531]
2025-05-09 10:30:12,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 197.0, 131.0, 136.0, 153.0, 109.0, 124.0, 132.0, 103.0, 104.0]
2025-05-09 10:30:12,312 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 3 minutes, 21 seconds)
2025-05-09 10:34:03,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:34:05,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 551.20380 ± 207.631
2025-05-09 10:34:05,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [349.3328, 994.29675, 372.17184, 503.4206, 476.44458, 548.2849, 292.66666, 754.49774, 471.00543, 749.91675]
2025-05-09 10:34:05,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 194.0, 67.0, 111.0, 86.0, 101.0, 52.0, 140.0, 87.0, 159.0]
2025-05-09 10:34:05,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 59 minutes, 31 seconds)
2025-05-09 10:38:00,166 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:38:02,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 581.66980 ± 114.345
2025-05-09 10:38:02,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [537.7568, 413.5716, 569.0743, 606.9187, 542.575, 581.18463, 539.14264, 715.422, 471.6385, 839.4142]
2025-05-09 10:38:02,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 75.0, 111.0, 115.0, 99.0, 108.0, 100.0, 141.0, 90.0, 164.0]
2025-05-09 10:38:02,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 56 minutes, 4 seconds)
2025-05-09 10:41:53,250 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:41:56,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 675.16028 ± 118.664
2025-05-09 10:41:56,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [623.67163, 585.4315, 772.42004, 740.4996, 742.0904, 777.91675, 615.65045, 385.95386, 783.417, 724.55145]
2025-05-09 10:41:56,199 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 117.0, 148.0, 149.0, 144.0, 148.0, 115.0, 82.0, 147.0, 154.0]
2025-05-09 10:41:56,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 52 minutes, 20 seconds)
2025-05-09 10:45:49,665 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:45:52,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 577.65277 ± 174.308
2025-05-09 10:45:52,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [708.4572, 519.2355, 604.64343, 1002.68695, 587.6375, 495.2504, 491.68256, 531.99774, 548.31903, 286.61755]
2025-05-09 10:45:52,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 97.0, 117.0, 209.0, 112.0, 91.0, 87.0, 107.0, 103.0, 54.0]
2025-05-09 10:45:52,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 48 minutes, 13 seconds)
2025-05-09 10:49:45,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:49:47,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 623.92218 ± 222.659
2025-05-09 10:49:47,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [758.94965, 567.90076, 459.94778, 1039.4503, 464.1289, 508.77765, 502.86276, 988.8452, 326.89148, 621.46686]
2025-05-09 10:49:47,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 116.0, 87.0, 194.0, 101.0, 94.0, 95.0, 198.0, 59.0, 117.0]
2025-05-09 10:49:47,834 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 44 minutes, 34 seconds)
2025-05-09 10:53:41,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:53:44,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 785.95538 ± 251.209
2025-05-09 10:53:44,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [938.7533, 1247.2396, 980.7765, 712.76373, 489.12024, 386.58307, 793.68774, 744.6952, 1009.0508, 556.88354]
2025-05-09 10:53:44,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 248.0, 181.0, 137.0, 89.0, 71.0, 169.0, 141.0, 198.0, 106.0]
2025-05-09 10:53:44,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (785.96) for latency MM1Queue_a033_s075
2025-05-09 10:53:44,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:53:44,694 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:53:44,707 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 41 minutes, 6 seconds)
2025-05-09 10:57:35,806 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:57:38,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 660.86426 ± 108.124
2025-05-09 10:57:38,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [616.1135, 607.3956, 732.6248, 688.56366, 925.5424, 521.8706, 586.74414, 593.07056, 725.60675, 611.1099]
2025-05-09 10:57:38,546 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 114.0, 144.0, 132.0, 174.0, 97.0, 108.0, 109.0, 141.0, 116.0]
2025-05-09 10:57:38,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 36 minutes, 47 seconds)
2025-05-09 11:01:33,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:01:36,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 589.07385 ± 103.825
2025-05-09 11:01:36,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [517.8153, 714.5544, 559.1344, 683.0723, 599.8131, 460.47742, 554.0328, 528.0481, 477.1087, 796.68164]
2025-05-09 11:01:36,140 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 135.0, 101.0, 129.0, 110.0, 82.0, 100.0, 106.0, 85.0, 170.0]
2025-05-09 11:01:36,148 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 33 minutes, 23 seconds)
2025-05-09 11:05:25,165 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:05:28,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 749.73553 ± 195.389
2025-05-09 11:05:28,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [606.57513, 569.7015, 898.15753, 759.0394, 870.94104, 706.8708, 728.47174, 556.89026, 578.4826, 1222.2255]
2025-05-09 11:05:28,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 105.0, 172.0, 138.0, 168.0, 136.0, 131.0, 120.0, 106.0, 238.0]
2025-05-09 11:05:28,386 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 28 minutes, 59 seconds)
2025-05-09 11:09:22,145 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:09:26,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 830.41809 ± 251.353
2025-05-09 11:09:26,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [514.97327, 785.65094, 868.2593, 1155.1761, 532.36395, 969.3908, 605.3525, 802.0266, 735.78467, 1335.2025]
2025-05-09 11:09:26,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 166.0, 173.0, 231.0, 98.0, 187.0, 112.0, 160.0, 142.0, 284.0]
2025-05-09 11:09:26,129 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (830.42) for latency MM1Queue_a033_s075
2025-05-09 11:09:26,130 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:09:26,133 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:09:26,147 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 25 minutes, 19 seconds)
2025-05-09 11:13:19,191 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:13:21,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 564.29529 ± 117.761
2025-05-09 11:13:21,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [663.4171, 442.10364, 486.60025, 638.75616, 604.93616, 807.50806, 451.62234, 442.229, 635.592, 470.18866]
2025-05-09 11:13:21,519 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 84.0, 88.0, 121.0, 109.0, 151.0, 86.0, 96.0, 119.0, 92.0]
2025-05-09 11:13:21,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 21 minutes, 13 seconds)
2025-05-09 11:17:15,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:17:17,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 577.13025 ± 165.147
2025-05-09 11:17:17,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [372.90607, 581.33826, 398.06183, 916.8332, 552.8508, 523.7924, 527.12604, 845.1693, 508.4261, 544.79767]
2025-05-09 11:17:17,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 106.0, 72.0, 169.0, 104.0, 96.0, 97.0, 176.0, 97.0, 100.0]
2025-05-09 11:17:17,677 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 17 minutes, 33 seconds)
2025-05-09 11:21:08,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:21:11,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 620.43896 ± 171.698
2025-05-09 11:21:11,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [696.21173, 589.03766, 683.5192, 1017.00745, 668.0064, 339.00656, 540.7412, 680.5091, 450.34894, 540.0012]
2025-05-09 11:21:11,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 107.0, 128.0, 212.0, 130.0, 63.0, 100.0, 126.0, 85.0, 99.0]
2025-05-09 11:21:11,470 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 13 minutes, 12 seconds)
2025-05-09 11:25:02,606 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:25:05,364 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 624.90381 ± 169.781
2025-05-09 11:25:05,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [621.63556, 374.71405, 592.6298, 778.0877, 382.1251, 567.23914, 495.73984, 902.7853, 719.24347, 814.8378]
2025-05-09 11:25:05,365 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 71.0, 120.0, 148.0, 83.0, 106.0, 91.0, 192.0, 139.0, 155.0]
2025-05-09 11:25:05,376 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 9 minutes, 28 seconds)
2025-05-09 11:28:58,023 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:29:01,225 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 756.82208 ± 195.988
2025-05-09 11:29:01,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [796.5279, 929.8563, 1022.3966, 454.0312, 982.58777, 752.2437, 531.60254, 525.8393, 657.05817, 916.0771]
2025-05-09 11:29:01,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 177.0, 198.0, 83.0, 196.0, 140.0, 99.0, 96.0, 126.0, 177.0]
2025-05-09 11:29:01,237 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 5 minutes, 20 seconds)
2025-05-09 11:32:56,435 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:32:59,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 634.96106 ± 119.217
2025-05-09 11:32:59,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [794.6255, 642.70825, 622.33514, 634.81946, 624.85596, 489.3038, 445.0808, 650.9944, 574.9995, 869.888]
2025-05-09 11:32:59,062 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 119.0, 117.0, 116.0, 115.0, 86.0, 82.0, 139.0, 106.0, 170.0]
2025-05-09 11:32:59,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 1 minute, 40 seconds)
2025-05-09 11:36:48,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:36:50,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 612.92688 ± 133.449
2025-05-09 11:36:50,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [759.07605, 683.2393, 558.20593, 719.1284, 307.54895, 602.92596, 470.1371, 699.1399, 738.83984, 591.0268]
2025-05-09 11:36:50,807 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 132.0, 105.0, 137.0, 63.0, 117.0, 85.0, 140.0, 137.0, 128.0]
2025-05-09 11:36:50,819 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 57 minutes, 18 seconds)
2025-05-09 11:40:43,651 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:40:46,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 613.61218 ± 108.086
2025-05-09 11:40:46,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [798.5362, 728.87085, 521.0475, 585.9762, 649.9824, 457.45938, 686.0329, 455.7188, 582.59674, 669.9008]
2025-05-09 11:40:46,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 137.0, 95.0, 108.0, 122.0, 95.0, 127.0, 86.0, 107.0, 124.0]
2025-05-09 11:40:46,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 53 minutes, 33 seconds)
2025-05-09 11:44:38,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:44:40,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 589.66455 ± 127.785
2025-05-09 11:44:40,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [816.53705, 484.5117, 394.74527, 443.4508, 568.42065, 561.05396, 634.7489, 726.8044, 721.1937, 545.1791]
2025-05-09 11:44:40,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 93.0, 78.0, 78.0, 102.0, 105.0, 115.0, 135.0, 136.0, 98.0]
2025-05-09 11:44:40,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 49 minutes, 40 seconds)
2025-05-09 11:48:32,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:48:34,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 715.11536 ± 187.441
2025-05-09 11:48:34,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [431.66058, 994.48083, 1049.2925, 632.40137, 788.5633, 757.37, 577.3448, 541.49603, 597.63763, 780.9067]
2025-05-09 11:48:34,956 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 188.0, 201.0, 115.0, 153.0, 146.0, 104.0, 96.0, 115.0, 145.0]
2025-05-09 11:48:34,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 45 minutes, 38 seconds)
2025-05-09 11:52:29,939 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:52:32,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 656.72565 ± 183.239
2025-05-09 11:52:32,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [620.50946, 771.0193, 789.41864, 858.12006, 395.2252, 641.1441, 575.3948, 383.03168, 555.8801, 977.51324]
2025-05-09 11:52:32,741 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 149.0, 155.0, 168.0, 81.0, 122.0, 111.0, 69.0, 101.0, 177.0]
2025-05-09 11:52:32,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 41 minutes, 43 seconds)
2025-05-09 11:56:23,517 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:56:26,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 652.39679 ± 177.714
2025-05-09 11:56:26,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [463.58774, 611.8183, 505.90378, 563.87463, 663.12787, 1137.2133, 646.05994, 739.94415, 591.17633, 601.2616]
2025-05-09 11:56:26,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 113.0, 93.0, 108.0, 124.0, 220.0, 118.0, 139.0, 109.0, 112.0]
2025-05-09 11:56:26,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 37 minutes, 56 seconds)
2025-05-09 12:00:21,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:00:23,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 608.04858 ± 46.711
2025-05-09 12:00:23,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [666.18225, 531.84827, 580.7435, 620.4685, 569.9785, 586.61725, 624.535, 615.215, 702.1768, 582.7209]
2025-05-09 12:00:23,543 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 100.0, 110.0, 115.0, 103.0, 106.0, 114.0, 114.0, 141.0, 106.0]
2025-05-09 12:00:23,553 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 34 minutes, 11 seconds)
2025-05-09 12:04:15,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:04:18,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 724.89447 ± 315.844
2025-05-09 12:04:18,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [548.12994, 353.63434, 786.447, 990.156, 895.17566, 314.98242, 948.70905, 545.5091, 1373.7527, 492.44867]
2025-05-09 12:04:18,454 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 70.0, 149.0, 188.0, 174.0, 60.0, 189.0, 106.0, 275.0, 95.0]
2025-05-09 12:04:18,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 30 minutes, 18 seconds)
2025-05-09 12:08:10,881 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:08:13,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 711.91199 ± 229.525
2025-05-09 12:08:13,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [750.08435, 845.3859, 662.1906, 690.41754, 360.33936, 1133.9839, 762.3486, 668.1999, 925.26483, 320.90506]
2025-05-09 12:08:13,958 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 159.0, 120.0, 145.0, 66.0, 222.0, 138.0, 128.0, 182.0, 61.0]
2025-05-09 12:08:13,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 26 minutes, 27 seconds)
2025-05-09 12:12:07,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:12:10,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 677.01221 ± 124.194
2025-05-09 12:12:10,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [532.46234, 797.11316, 745.15173, 631.17615, 540.8832, 699.1341, 921.7315, 503.58392, 666.3837, 732.502]
2025-05-09 12:12:10,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 144.0, 140.0, 118.0, 97.0, 132.0, 168.0, 108.0, 121.0, 141.0]
2025-05-09 12:12:10,383 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 22 minutes, 26 seconds)
2025-05-09 12:16:01,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:16:03,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 715.75049 ± 194.599
2025-05-09 12:16:03,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [792.5499, 663.7137, 445.82678, 764.8413, 871.3184, 1180.1875, 567.7724, 652.20746, 552.91656, 666.171]
2025-05-09 12:16:03,992 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 142.0, 82.0, 150.0, 165.0, 224.0, 109.0, 121.0, 104.0, 119.0]
2025-05-09 12:16:04,005 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 18 minutes, 31 seconds)
2025-05-09 12:19:59,571 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:20:03,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 845.39911 ± 223.711
2025-05-09 12:20:03,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [961.4219, 691.30096, 647.5321, 709.29645, 721.0623, 660.7019, 823.545, 1326.0415, 1179.9362, 733.15314]
2025-05-09 12:20:03,086 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 129.0, 117.0, 130.0, 137.0, 119.0, 155.0, 258.0, 232.0, 132.0]
2025-05-09 12:20:03,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (845.40) for latency MM1Queue_a033_s075
2025-05-09 12:20:03,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:20:03,090 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:20:03,107 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 14 minutes, 42 seconds)
2025-05-09 12:23:53,170 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:23:56,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 704.21240 ± 180.379
2025-05-09 12:23:56,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [633.8763, 652.6874, 1015.3389, 682.03156, 617.29114, 569.0739, 756.08704, 434.0382, 638.82404, 1042.8752]
2025-05-09 12:23:56,152 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 125.0, 196.0, 124.0, 118.0, 121.0, 144.0, 78.0, 118.0, 205.0]
2025-05-09 12:23:56,164 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 10 minutes, 39 seconds)
2025-05-09 12:27:48,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:27:51,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 686.06763 ± 149.747
2025-05-09 12:27:51,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [723.6705, 895.7765, 534.11, 621.2279, 871.15985, 600.4633, 397.35413, 657.5941, 844.0063, 715.31354]
2025-05-09 12:27:51,356 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 173.0, 96.0, 134.0, 167.0, 113.0, 72.0, 129.0, 165.0, 148.0]
2025-05-09 12:27:51,370 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 6 minutes, 43 seconds)
2025-05-09 12:31:46,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:31:49,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 782.11005 ± 244.070
2025-05-09 12:31:49,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1286.0563, 709.485, 678.7341, 990.2105, 928.48553, 769.696, 631.7277, 913.9096, 390.32428, 522.47144]
2025-05-09 12:31:49,471 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 135.0, 124.0, 189.0, 176.0, 148.0, 119.0, 193.0, 69.0, 100.0]
2025-05-09 12:31:49,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 2 minutes, 53 seconds)
2025-05-09 12:35:42,175 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:35:44,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 671.27917 ± 196.412
2025-05-09 12:35:44,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [565.1029, 817.3687, 1168.5758, 615.7004, 499.14294, 476.70554, 541.11676, 570.9376, 757.24896, 700.8919]
2025-05-09 12:35:44,966 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 157.0, 223.0, 123.0, 94.0, 84.0, 106.0, 108.0, 147.0, 129.0]
2025-05-09 12:35:44,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 59 minutes, 2 seconds)
2025-05-09 12:39:38,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:39:41,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 811.55017 ± 262.747
2025-05-09 12:39:41,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [437.53635, 765.27405, 1372.1489, 1124.0449, 851.14307, 634.434, 936.2822, 628.18713, 786.557, 579.8935]
2025-05-09 12:39:41,871 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 142.0, 267.0, 203.0, 179.0, 117.0, 172.0, 117.0, 147.0, 106.0]
2025-05-09 12:39:41,883 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 55 minutes)
2025-05-09 12:43:33,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:43:37,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 921.03986 ± 299.474
2025-05-09 12:43:37,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1652.8961, 491.50723, 1154.3214, 794.75726, 834.9158, 1030.1047, 667.8259, 893.2951, 918.6037, 772.17065]
2025-05-09 12:43:37,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [314.0, 88.0, 211.0, 152.0, 157.0, 191.0, 123.0, 164.0, 174.0, 154.0]
2025-05-09 12:43:37,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (921.04) for latency MM1Queue_a033_s075
2025-05-09 12:43:37,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:43:37,212 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:43:37,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 51 minutes, 10 seconds)
2025-05-09 12:47:31,124 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:47:33,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 681.97510 ± 201.615
2025-05-09 12:47:33,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [462.18503, 1107.348, 541.2775, 408.39532, 701.7674, 759.7843, 839.7525, 797.6584, 489.67853, 711.9033]
2025-05-09 12:47:33,932 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 214.0, 99.0, 73.0, 137.0, 152.0, 161.0, 144.0, 88.0, 135.0]
2025-05-09 12:47:33,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 47 minutes, 18 seconds)
2025-05-09 12:51:24,367 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:51:28,282 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 918.33057 ± 375.734
2025-05-09 12:51:28,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [701.68884, 1289.6355, 572.3255, 753.3031, 1107.9681, 677.7258, 836.89355, 490.68814, 1810.5508, 942.52655]
2025-05-09 12:51:28,283 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 249.0, 107.0, 159.0, 206.0, 129.0, 159.0, 86.0, 335.0, 180.0]
2025-05-09 12:51:28,295 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 43 minutes, 13 seconds)
2025-05-09 12:55:21,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:55:25,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 864.49939 ± 331.202
2025-05-09 12:55:25,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [673.7147, 851.71985, 735.61816, 1234.3552, 1704.0013, 497.9439, 753.00104, 693.36194, 740.5439, 760.7334]
2025-05-09 12:55:25,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 165.0, 133.0, 228.0, 346.0, 89.0, 147.0, 126.0, 153.0, 147.0]
2025-05-09 12:55:25,617 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 39 minutes, 21 seconds)
2025-05-09 12:59:19,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:59:23,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 869.36902 ± 303.185
2025-05-09 12:59:23,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [348.79907, 1491.222, 685.6055, 1006.5524, 940.74426, 931.1327, 675.8155, 1180.3055, 814.96893, 618.5443]
2025-05-09 12:59:23,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 280.0, 148.0, 200.0, 178.0, 179.0, 124.0, 224.0, 167.0, 114.0]
2025-05-09 12:59:23,583 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 35 minutes, 27 seconds)
2025-05-09 13:03:16,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:03:19,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 774.22253 ± 191.600
2025-05-09 13:03:19,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [653.68665, 831.26184, 647.42706, 754.9763, 965.7026, 907.242, 688.18866, 699.5063, 1160.7677, 433.46667]
2025-05-09 13:03:19,415 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 161.0, 119.0, 159.0, 189.0, 175.0, 145.0, 131.0, 224.0, 76.0]
2025-05-09 13:03:19,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 31 minutes, 31 seconds)
2025-05-09 13:07:09,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:07:12,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 825.16541 ± 182.760
2025-05-09 13:07:12,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [669.3703, 808.60016, 756.2046, 949.7667, 964.1687, 1133.2042, 715.63806, 455.84824, 811.24207, 987.61035]
2025-05-09 13:07:12,542 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 153.0, 146.0, 181.0, 173.0, 218.0, 133.0, 82.0, 163.0, 193.0]
2025-05-09 13:07:12,558 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 27 minutes, 30 seconds)
2025-05-09 13:11:06,951 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:11:10,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 733.29810 ± 294.686
2025-05-09 13:11:10,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [849.3763, 472.8568, 886.51733, 422.66772, 1448.33, 747.12585, 629.9943, 424.5854, 568.31665, 883.21045]
2025-05-09 13:11:10,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 102.0, 173.0, 76.0, 269.0, 159.0, 135.0, 78.0, 103.0, 170.0]
2025-05-09 13:11:10,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 23 minutes, 38 seconds)
2025-05-09 13:15:01,156 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:15:04,858 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 898.32458 ± 264.629
2025-05-09 13:15:04,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [865.4245, 642.37085, 1567.6808, 809.6521, 710.3532, 1097.9828, 797.15765, 643.9174, 809.7267, 1038.9797]
2025-05-09 13:15:04,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 116.0, 291.0, 144.0, 132.0, 215.0, 143.0, 119.0, 156.0, 198.0]
2025-05-09 13:15:04,872 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 39 seconds)
2025-05-09 13:18:58,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:19:02,117 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 860.59619 ± 449.979
2025-05-09 13:19:02,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [488.6795, 874.0881, 636.4463, 670.7778, 751.17084, 731.3563, 1001.7411, 511.62012, 2135.927, 804.1553]
2025-05-09 13:19:02,118 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 162.0, 120.0, 127.0, 141.0, 134.0, 189.0, 93.0, 406.0, 169.0]
2025-05-09 13:19:02,131 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-05-09 13:22:54,906 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:22:58,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 871.64392 ± 177.251
2025-05-09 13:22:58,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1049.2412, 1012.12787, 617.85065, 765.90155, 948.99255, 655.57996, 725.87854, 1151.8031, 764.4145, 1024.649]
2025-05-09 13:22:58,631 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 193.0, 113.0, 155.0, 180.0, 121.0, 137.0, 208.0, 143.0, 204.0]
2025-05-09 13:22:58,645 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 47 seconds)
2025-05-09 13:26:50,278 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:26:53,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 793.67273 ± 319.161
2025-05-09 13:26:53,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [672.6518, 833.42346, 702.4389, 914.48975, 1686.6816, 641.44214, 613.71606, 668.4049, 737.93475, 465.54428]
2025-05-09 13:26:53,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 165.0, 130.0, 172.0, 318.0, 137.0, 110.0, 125.0, 135.0, 87.0]
2025-05-09 13:26:53,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 52 seconds)
2025-05-09 13:30:45,889 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:30:49,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 786.10461 ± 221.282
2025-05-09 13:30:49,264 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [736.30817, 1019.237, 694.95685, 1023.06714, 561.31177, 1128.5083, 416.90945, 897.4252, 822.15466, 561.1676]
2025-05-09 13:30:49,265 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 205.0, 128.0, 191.0, 121.0, 219.0, 75.0, 159.0, 163.0, 103.0]
2025-05-09 13:30:49,281 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 55 seconds)
2025-05-09 13:34:43,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:34:47,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 906.90320 ± 306.192
2025-05-09 13:34:47,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [686.11066, 1109.1726, 882.56793, 699.6152, 354.60684, 1113.3032, 1520.2961, 1124.1293, 736.0938, 843.1364]
2025-05-09 13:34:47,313 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 220.0, 162.0, 131.0, 77.0, 209.0, 288.0, 203.0, 143.0, 163.0]
2025-05-09 13:34:47,330 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1251 [DEBUG]: Training session finished
