2025-05-11 03:32:16,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 03:32:16,015 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4
2025-05-11 03:32:16,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x7f2e1e03df70>}
2025-05-11 03:32:16,016 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1111 [DEBUG]: using device: cpu
2025-05-11 03:32:16,016 baseline-sac-noisy-humanoid:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 24
2025-05-11 03:32:16,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-11 03:32:16,052 baseline-sac-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=444, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-11 03:32:16,052 baseline-sac-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=461, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 03:32:16,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-11 03:32:16,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-11 03:36:05,173 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:36:06,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 306.92139 ± 13.335
2025-05-11 03:36:06,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [298.76654, 301.0852, 293.81183, 307.6617, 285.69684, 330.73862, 313.80093, 315.9897, 323.6227, 298.03967]
2025-05-11 03:36:06,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 56.0, 54.0, 56.0, 52.0, 62.0, 57.0, 58.0, 60.0, 55.0]
2025-05-11 03:36:06,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (306.92) for latency MM1Queue_a033_s075
2025-05-11 03:36:06,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:36:06,617 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 03:36:06,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 19 minutes, 33 seconds)
2025-05-11 03:40:23,052 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:40:24,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 250.34305 ± 89.189
2025-05-11 03:40:24,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [174.5384, 236.65796, 439.15024, 185.88936, 282.42868, 179.05678, 381.6637, 264.91147, 169.00108, 190.13281]
2025-05-11 03:40:24,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 45.0, 89.0, 41.0, 59.0, 41.0, 79.0, 49.0, 39.0, 43.0]
2025-05-11 03:40:24,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 38 minutes, 13 seconds)
2025-05-11 03:44:38,712 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:44:40,126 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 304.39761 ± 71.114
2025-05-11 03:44:40,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [403.93048, 249.3343, 210.24805, 222.80734, 357.9704, 263.27612, 432.00916, 272.84976, 318.22385, 313.3265]
2025-05-11 03:44:40,127 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 55.0, 42.0, 50.0, 78.0, 59.0, 79.0, 59.0, 68.0, 69.0]
2025-05-11 03:44:40,128 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 40 minutes, 41 seconds)
2025-05-11 03:48:57,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:48:59,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 356.06247 ± 68.957
2025-05-11 03:48:59,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [440.055, 356.9643, 362.04343, 358.27277, 269.76373, 514.56775, 318.10922, 331.13354, 321.84192, 287.87323]
2025-05-11 03:48:59,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 66.0, 79.0, 77.0, 60.0, 102.0, 68.0, 64.0, 73.0, 64.0]
2025-05-11 03:48:59,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (356.06) for latency MM1Queue_a033_s075
2025-05-11 03:48:59,462 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 03:48:59,466 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 03:48:59,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 41 minutes, 9 seconds)
2025-05-11 03:53:15,474 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:53:16,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 326.64792 ± 88.836
2025-05-11 03:53:16,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [258.6866, 198.54166, 367.78302, 534.5581, 320.74994, 350.3265, 310.98386, 348.79382, 353.81622, 222.23976]
2025-05-11 03:53:16,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 38.0, 69.0, 102.0, 61.0, 65.0, 59.0, 67.0, 68.0, 42.0]
2025-05-11 03:53:16,957 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 39 minutes, 6 seconds)
2025-05-11 03:57:36,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 03:57:38,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 346.90939 ± 64.110
2025-05-11 03:57:38,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [397.1113, 299.8244, 372.2738, 260.65945, 446.79666, 260.39322, 411.95947, 324.88303, 402.82843, 292.36423]
2025-05-11 03:57:38,414 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 62.0, 66.0, 55.0, 82.0, 56.0, 76.0, 60.0, 89.0, 63.0]
2025-05-11 03:57:38,416 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 44 minutes, 45 seconds)
2025-05-11 04:01:55,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:01:57,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 466.90698 ± 119.591
2025-05-11 04:01:57,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [612.28894, 736.5041, 419.8427, 492.02576, 479.87418, 383.14743, 469.22586, 351.3104, 406.9858, 317.86478]
2025-05-11 04:01:57,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 145.0, 76.0, 93.0, 95.0, 84.0, 91.0, 67.0, 78.0, 69.0]
2025-05-11 04:01:57,347 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (466.91) for latency MM1Queue_a033_s075
2025-05-11 04:01:57,348 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:01:57,352 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 04:01:57,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 40 minutes, 52 seconds)
2025-05-11 04:06:17,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:06:19,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 389.14587 ± 126.580
2025-05-11 04:06:19,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [393.3129, 436.33954, 106.019806, 369.5445, 601.61426, 527.40436, 376.20377, 347.35513, 438.48694, 295.17758]
2025-05-11 04:06:19,212 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 87.0, 21.0, 70.0, 111.0, 102.0, 71.0, 65.0, 83.0, 66.0]
2025-05-11 04:06:19,214 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 38 minutes, 23 seconds)
2025-05-11 04:10:41,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:10:42,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 359.73236 ± 60.503
2025-05-11 04:10:42,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [272.62097, 381.77444, 432.93707, 304.46964, 277.10712, 392.0829, 399.25247, 384.35645, 445.36972, 307.3527]
2025-05-11 04:10:42,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 71.0, 79.0, 56.0, 54.0, 74.0, 73.0, 70.0, 82.0, 57.0]
2025-05-11 04:10:42,859 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 35 minutes, 21 seconds)
2025-05-11 04:15:09,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:15:10,802 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 375.87100 ± 107.660
2025-05-11 04:15:10,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [353.72775, 447.1524, 388.25284, 512.1057, 108.217995, 344.6711, 418.9984, 495.99365, 364.6755, 324.91473]
2025-05-11 04:15:10,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 89.0, 72.0, 96.0, 21.0, 77.0, 90.0, 91.0, 81.0, 60.0]
2025-05-11 04:15:10,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 34 minutes, 9 seconds)
2025-05-11 04:19:34,628 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:19:36,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 404.72012 ± 122.696
2025-05-11 04:19:36,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [597.24243, 279.751, 486.3319, 255.9272, 438.59564, 200.31926, 550.8638, 451.8559, 426.97525, 359.33893]
2025-05-11 04:19:36,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 60.0, 99.0, 55.0, 81.0, 43.0, 103.0, 87.0, 81.0, 67.0]
2025-05-11 04:19:36,507 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 31 minutes, 2 seconds)
2025-05-11 04:24:00,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:24:02,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 446.02744 ± 75.682
2025-05-11 04:24:02,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [413.6138, 522.56177, 520.5199, 438.09326, 366.34647, 531.14716, 384.92474, 303.02618, 529.4543, 450.5869]
2025-05-11 04:24:02,720 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 100.0, 110.0, 80.0, 78.0, 101.0, 70.0, 59.0, 101.0, 83.0]
2025-05-11 04:24:02,723 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 28 minutes, 46 seconds)
2025-05-11 04:28:24,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:28:27,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 457.30307 ± 120.805
2025-05-11 04:28:27,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [448.12952, 329.8797, 366.57614, 700.98425, 606.9719, 548.384, 372.67773, 395.97543, 486.0135, 317.43857]
2025-05-11 04:28:27,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 64.0, 71.0, 145.0, 118.0, 120.0, 70.0, 77.0, 92.0, 60.0]
2025-05-11 04:28:27,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 25 minutes, 4 seconds)
2025-05-11 04:32:49,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:32:51,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 431.82111 ± 78.895
2025-05-11 04:32:51,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [442.8538, 446.39532, 308.0633, 365.6303, 572.8586, 403.44296, 420.603, 468.88568, 543.94366, 345.53458]
2025-05-11 04:32:51,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 80.0, 58.0, 68.0, 107.0, 76.0, 91.0, 89.0, 101.0, 64.0]
2025-05-11 04:32:51,653 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 20 minutes, 55 seconds)
2025-05-11 04:37:14,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:37:17,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 480.79141 ± 88.474
2025-05-11 04:37:17,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [435.4865, 439.65073, 559.45135, 403.9647, 476.58224, 508.29538, 390.0454, 372.12247, 672.93774, 549.37787]
2025-05-11 04:37:17,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 97.0, 109.0, 75.0, 90.0, 97.0, 71.0, 69.0, 127.0, 103.0]
2025-05-11 04:37:17,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (480.79) for latency MM1Queue_a033_s075
2025-05-11 04:37:17,014 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:37:17,019 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 04:37:17,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 15 minutes, 45 seconds)
2025-05-11 04:41:38,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:41:40,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 451.94638 ± 122.125
2025-05-11 04:41:40,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [245.7062, 403.88315, 655.84674, 478.48816, 447.7459, 309.56296, 393.34592, 445.5777, 497.03928, 642.2678]
2025-05-11 04:41:40,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 87.0, 124.0, 92.0, 86.0, 66.0, 89.0, 84.0, 95.0, 123.0]
2025-05-11 04:41:40,564 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 10 minutes, 44 seconds)
2025-05-11 04:46:03,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:46:05,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 391.10199 ± 39.839
2025-05-11 04:46:05,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [331.51852, 399.5983, 383.747, 484.9344, 403.50064, 387.68808, 419.36383, 361.81116, 351.69867, 387.15906]
2025-05-11 04:46:05,421 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 73.0, 73.0, 92.0, 73.0, 71.0, 75.0, 66.0, 65.0, 71.0]
2025-05-11 04:46:05,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 5 minutes, 56 seconds)
2025-05-11 04:50:29,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:50:31,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 393.66541 ± 57.666
2025-05-11 04:50:31,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [301.4075, 342.7913, 518.83936, 404.98373, 364.43927, 455.1911, 353.42044, 397.682, 395.37372, 402.5254]
2025-05-11 04:50:31,231 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 62.0, 99.0, 74.0, 67.0, 84.0, 65.0, 72.0, 71.0, 73.0]
2025-05-11 04:50:31,235 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 1 minute, 56 seconds)
2025-05-11 04:54:52,902 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:54:54,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 406.72955 ± 38.655
2025-05-11 04:54:54,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [429.5128, 320.9292, 411.84598, 427.96625, 391.41135, 422.09332, 373.7745, 470.45868, 387.86713, 431.43594]
2025-05-11 04:54:54,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 69.0, 75.0, 93.0, 73.0, 91.0, 70.0, 102.0, 71.0, 80.0]
2025-05-11 04:54:54,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 57 minutes, 15 seconds)
2025-05-11 04:59:18,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 04:59:20,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 480.94897 ± 132.010
2025-05-11 04:59:20,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [357.36337, 639.1904, 574.63104, 501.25366, 322.98483, 649.94476, 512.3284, 490.15472, 222.501, 539.1376]
2025-05-11 04:59:20,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 138.0, 107.0, 108.0, 67.0, 123.0, 95.0, 88.0, 49.0, 102.0]
2025-05-11 04:59:20,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (480.95) for latency MM1Queue_a033_s075
2025-05-11 04:59:20,318 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 04:59:20,322 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 04:59:20,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 52 minutes, 52 seconds)
2025-05-11 05:03:48,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:03:50,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 459.03168 ± 44.914
2025-05-11 05:03:50,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [429.53885, 543.8074, 436.79535, 493.38214, 465.89255, 514.4428, 430.16498, 446.11053, 450.1814, 380.0009]
2025-05-11 05:03:50,334 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 103.0, 79.0, 92.0, 87.0, 95.0, 96.0, 80.0, 81.0, 69.0]
2025-05-11 05:03:50,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 50 minutes, 10 seconds)
2025-05-11 05:08:10,201 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:08:12,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 457.50262 ± 113.972
2025-05-11 05:08:12,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [416.9322, 584.8172, 419.59607, 576.09576, 401.79318, 658.8143, 379.38373, 372.7018, 501.50378, 263.388]
2025-05-11 05:08:12,366 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 111.0, 78.0, 121.0, 76.0, 144.0, 70.0, 74.0, 106.0, 52.0]
2025-05-11 05:08:12,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 45 minutes)
2025-05-11 05:12:36,211 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:12:38,399 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 461.42569 ± 71.622
2025-05-11 05:12:38,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [424.43832, 561.28577, 474.39233, 436.2537, 335.1071, 505.56525, 575.07983, 467.77124, 463.96948, 370.39352]
2025-05-11 05:12:38,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 122.0, 105.0, 81.0, 62.0, 95.0, 106.0, 85.0, 86.0, 70.0]
2025-05-11 05:12:38,420 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 40 minutes, 38 seconds)
2025-05-11 05:17:02,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:17:04,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 457.19287 ± 88.690
2025-05-11 05:17:04,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [421.47803, 572.5334, 348.99957, 357.88583, 564.16876, 372.02783, 496.50858, 538.2827, 361.72955, 538.31445]
2025-05-11 05:17:04,650 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 108.0, 64.0, 65.0, 108.0, 69.0, 106.0, 108.0, 79.0, 102.0]
2025-05-11 05:17:04,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 36 minutes, 53 seconds)
2025-05-11 05:21:29,664 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:21:32,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 526.75769 ± 87.813
2025-05-11 05:21:32,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [546.4004, 559.69464, 561.1078, 715.77344, 462.13525, 436.83087, 613.4908, 408.40375, 460.53296, 503.20673]
2025-05-11 05:21:32,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 106.0, 107.0, 138.0, 85.0, 91.0, 132.0, 91.0, 98.0, 95.0]
2025-05-11 05:21:32,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (526.76) for latency MM1Queue_a033_s075
2025-05-11 05:21:32,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:21:32,289 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 05:21:32,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 32 minutes, 59 seconds)
2025-05-11 05:25:57,167 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:25:59,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 461.26562 ± 93.739
2025-05-11 05:25:59,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [422.92703, 429.59097, 240.73688, 481.03745, 615.6553, 508.82367, 482.7035, 470.3928, 413.3118, 547.4768]
2025-05-11 05:25:59,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 79.0, 44.0, 88.0, 133.0, 93.0, 90.0, 85.0, 79.0, 109.0]
2025-05-11 05:25:59,301 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 27 minutes, 48 seconds)
2025-05-11 05:30:23,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:30:26,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 531.66193 ± 149.231
2025-05-11 05:30:26,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [605.44867, 418.83505, 462.48724, 431.37744, 401.594, 569.8349, 404.4032, 465.01138, 659.43396, 898.1936]
2025-05-11 05:30:26,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 89.0, 89.0, 81.0, 88.0, 110.0, 76.0, 82.0, 128.0, 171.0]
2025-05-11 05:30:26,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (531.66) for latency MM1Queue_a033_s075
2025-05-11 05:30:26,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:30:26,034 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 05:30:26,047 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 24 minutes, 31 seconds)
2025-05-11 05:34:48,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:34:50,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 491.62784 ± 127.706
2025-05-11 05:34:50,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [416.13672, 344.22733, 621.4587, 723.0406, 358.67505, 338.0189, 487.5048, 604.15656, 584.0186, 439.04114]
2025-05-11 05:34:50,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 77.0, 116.0, 139.0, 70.0, 71.0, 97.0, 133.0, 111.0, 80.0]
2025-05-11 05:34:51,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 19 minutes, 49 seconds)
2025-05-11 05:39:14,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:39:16,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 515.73077 ± 100.891
2025-05-11 05:39:16,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [406.88568, 387.91388, 472.4197, 721.3703, 431.88174, 442.24832, 555.6747, 537.91864, 617.11456, 583.87964]
2025-05-11 05:39:16,721 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 73.0, 87.0, 138.0, 80.0, 84.0, 105.0, 101.0, 118.0, 116.0]
2025-05-11 05:39:16,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 15 minutes, 15 seconds)
2025-05-11 05:43:39,839 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:43:41,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 467.35873 ± 134.180
2025-05-11 05:43:41,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [518.14246, 686.6308, 532.5406, 402.466, 500.19937, 286.24933, 467.71362, 652.7817, 364.78635, 262.0772]
2025-05-11 05:43:41,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 132.0, 104.0, 77.0, 93.0, 60.0, 86.0, 124.0, 79.0, 59.0]
2025-05-11 05:43:41,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 10 minutes, 15 seconds)
2025-05-11 05:47:59,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:48:02,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 586.64026 ± 155.982
2025-05-11 05:48:02,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [410.0947, 654.6082, 489.39148, 829.7327, 396.00446, 492.22885, 714.9297, 828.32367, 600.9601, 450.1291]
2025-05-11 05:48:02,417 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 128.0, 87.0, 161.0, 80.0, 91.0, 146.0, 174.0, 128.0, 81.0]
2025-05-11 05:48:02,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (586.64) for latency MM1Queue_a033_s075
2025-05-11 05:48:02,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 05:48:02,421 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 05:48:02,461 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 4 minutes, 19 seconds)
2025-05-11 05:52:18,814 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:52:20,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 488.55045 ± 76.995
2025-05-11 05:52:20,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [401.4659, 476.28287, 582.835, 375.93942, 543.0664, 612.98987, 506.41357, 500.83112, 384.22745, 501.4527]
2025-05-11 05:52:20,964 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 87.0, 110.0, 68.0, 98.0, 120.0, 93.0, 108.0, 72.0, 96.0]
2025-05-11 05:52:20,971 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 58 minutes, 2 seconds)
2025-05-11 05:56:39,824 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 05:56:42,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 577.35706 ± 119.367
2025-05-11 05:56:42,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [629.97144, 506.54663, 406.6044, 417.1833, 733.15717, 621.17633, 465.06348, 681.0544, 750.84033, 561.9727]
2025-05-11 05:56:42,441 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 94.0, 74.0, 79.0, 133.0, 116.0, 85.0, 132.0, 147.0, 102.0]
2025-05-11 05:56:42,448 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 52 minutes, 53 seconds)
2025-05-11 06:01:06,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:01:09,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 531.25720 ± 93.532
2025-05-11 06:01:09,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [595.6852, 402.33347, 679.93854, 613.6786, 407.99887, 510.19568, 439.03635, 533.84576, 489.4454, 640.41394]
2025-05-11 06:01:09,326 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 90.0, 132.0, 120.0, 91.0, 98.0, 83.0, 105.0, 98.0, 124.0]
2025-05-11 06:01:09,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 48 minutes, 46 seconds)
2025-05-11 06:05:29,331 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:05:31,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 517.26776 ± 130.123
2025-05-11 06:05:31,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [483.13287, 443.05148, 545.69904, 697.5208, 610.26117, 513.83777, 340.89346, 671.804, 593.8252, 272.6518]
2025-05-11 06:05:31,828 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 80.0, 111.0, 149.0, 118.0, 113.0, 62.0, 128.0, 113.0, 56.0]
2025-05-11 06:05:31,835 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 43 minutes, 48 seconds)
2025-05-11 06:09:56,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:09:58,946 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 550.51056 ± 101.716
2025-05-11 06:09:58,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [603.93774, 567.7376, 439.09888, 560.15564, 452.17627, 795.50586, 509.669, 585.1494, 427.77234, 563.9027]
2025-05-11 06:09:58,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 104.0, 79.0, 106.0, 81.0, 154.0, 90.0, 109.0, 77.0, 115.0]
2025-05-11 06:09:58,953 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 40 minutes, 51 seconds)
2025-05-11 06:14:23,109 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:14:26,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 629.25549 ± 149.620
2025-05-11 06:14:26,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [607.10175, 493.52817, 494.40826, 764.007, 553.9507, 669.2028, 567.30914, 1014.12604, 542.2983, 586.6229]
2025-05-11 06:14:26,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 92.0, 88.0, 145.0, 111.0, 144.0, 105.0, 217.0, 100.0, 113.0]
2025-05-11 06:14:26,187 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (629.26) for latency MM1Queue_a033_s075
2025-05-11 06:14:26,188 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 06:14:26,192 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 06:14:26,206 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 38 minutes, 17 seconds)
2025-05-11 06:18:50,038 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:18:52,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 577.40265 ± 72.244
2025-05-11 06:18:52,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [569.6103, 745.20447, 481.487, 556.5709, 582.37866, 668.39856, 517.55286, 548.2573, 565.3566, 539.2101]
2025-05-11 06:18:52,847 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 147.0, 87.0, 105.0, 134.0, 129.0, 99.0, 120.0, 115.0, 105.0]
2025-05-11 06:18:52,855 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 34 minutes, 57 seconds)
2025-05-11 06:23:15,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:23:18,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 542.28430 ± 155.561
2025-05-11 06:23:18,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [538.70734, 350.59488, 443.34952, 397.32254, 401.30115, 906.1072, 572.05646, 664.07007, 621.4268, 527.90717]
2025-05-11 06:23:18,115 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 78.0, 98.0, 77.0, 89.0, 177.0, 115.0, 127.0, 134.0, 117.0]
2025-05-11 06:23:18,123 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 30 minutes, 11 seconds)
2025-05-11 06:27:43,570 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:27:45,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 527.50482 ± 113.511
2025-05-11 06:27:45,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [341.90042, 544.7929, 489.1585, 404.2813, 688.29486, 685.2035, 576.7053, 640.06824, 465.1205, 439.5226]
2025-05-11 06:27:45,993 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 101.0, 88.0, 76.0, 124.0, 131.0, 107.0, 129.0, 90.0, 79.0]
2025-05-11 06:27:46,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 26 minutes, 50 seconds)
2025-05-11 06:32:09,512 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:32:11,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 510.67365 ± 97.273
2025-05-11 06:32:11,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [376.30392, 433.52222, 432.4309, 732.23517, 447.13312, 583.71466, 502.5173, 478.38843, 561.8842, 558.6067]
2025-05-11 06:32:11,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 79.0, 76.0, 138.0, 81.0, 109.0, 89.0, 85.0, 104.0, 100.0]
2025-05-11 06:32:11,789 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 22 minutes, 7 seconds)
2025-05-11 06:36:34,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:36:37,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 611.53625 ± 214.301
2025-05-11 06:36:37,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [411.52408, 1092.1814, 536.70123, 509.99866, 658.94824, 460.2139, 683.3354, 283.79007, 708.2619, 770.4081]
2025-05-11 06:36:37,630 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 232.0, 103.0, 95.0, 121.0, 92.0, 126.0, 65.0, 133.0, 152.0]
2025-05-11 06:36:37,639 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 17 minutes, 24 seconds)
2025-05-11 06:41:01,612 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:41:04,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 623.81232 ± 59.196
2025-05-11 06:41:04,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [598.39594, 522.3906, 675.689, 679.6968, 621.2989, 637.8481, 617.18353, 684.8436, 519.0678, 681.7085]
2025-05-11 06:41:04,509 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 101.0, 124.0, 129.0, 116.0, 124.0, 112.0, 130.0, 96.0, 127.0]
2025-05-11 06:41:04,518 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 13 minutes)
2025-05-11 06:45:29,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:45:32,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 559.37109 ± 76.508
2025-05-11 06:45:32,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [489.57025, 651.42566, 547.81104, 500.90137, 519.4541, 669.70526, 543.85516, 504.17926, 691.6353, 475.1734]
2025-05-11 06:45:32,072 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 121.0, 98.0, 90.0, 94.0, 127.0, 99.0, 94.0, 128.0, 85.0]
2025-05-11 06:45:32,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 9 minutes)
2025-05-11 06:49:53,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:49:55,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 497.16205 ± 89.991
2025-05-11 06:49:55,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [550.58813, 479.33887, 542.1369, 318.29507, 507.8178, 519.629, 503.6733, 679.6004, 402.17764, 468.36307]
2025-05-11 06:49:55,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 88.0, 99.0, 60.0, 90.0, 95.0, 89.0, 127.0, 75.0, 83.0]
2025-05-11 06:49:55,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 3 minutes, 48 seconds)
2025-05-11 06:54:21,294 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:54:23,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 579.91016 ± 100.056
2025-05-11 06:54:23,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [593.80206, 500.47867, 413.9786, 716.8965, 636.83826, 626.91754, 530.8098, 753.61224, 524.9802, 500.7871]
2025-05-11 06:54:23,935 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 91.0, 73.0, 129.0, 117.0, 126.0, 97.0, 139.0, 101.0, 89.0]
2025-05-11 06:54:23,945 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 59 minutes, 47 seconds)
2025-05-11 06:58:51,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 06:58:53,891 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 560.00598 ± 91.390
2025-05-11 06:58:53,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [430.6484, 507.83072, 451.70074, 663.0615, 531.06824, 546.5416, 574.06, 534.74854, 754.2507, 606.1498]
2025-05-11 06:58:53,892 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 90.0, 81.0, 122.0, 98.0, 103.0, 103.0, 96.0, 148.0, 113.0]
2025-05-11 06:58:53,899 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 56 minutes, 4 seconds)
2025-05-11 07:03:09,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:03:12,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 645.29675 ± 195.501
2025-05-11 07:03:12,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [620.87085, 799.31067, 315.9525, 546.46954, 509.95566, 985.0712, 540.153, 933.27954, 664.0824, 537.8218]
2025-05-11 07:03:12,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 146.0, 67.0, 97.0, 91.0, 191.0, 96.0, 196.0, 131.0, 96.0]
2025-05-11 07:03:12,648 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (645.30) for latency MM1Queue_a033_s075
2025-05-11 07:03:12,649 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:03:12,653 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 07:03:12,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 50 minutes, 12 seconds)
2025-05-11 07:07:30,483 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:07:33,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 573.99176 ± 80.786
2025-05-11 07:07:33,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [530.1162, 627.2876, 503.93317, 580.17346, 503.45685, 559.95496, 535.3058, 762.05035, 486.77194, 650.86707]
2025-05-11 07:07:33,186 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 116.0, 98.0, 108.0, 92.0, 103.0, 99.0, 151.0, 87.0, 122.0]
2025-05-11 07:07:33,215 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 44 minutes, 35 seconds)
2025-05-11 07:11:57,269 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:11:59,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 519.17059 ± 80.021
2025-05-11 07:11:59,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [606.5477, 502.92502, 438.9794, 688.75806, 548.17175, 408.84006, 478.68823, 489.70764, 462.6789, 566.40936]
2025-05-11 07:11:59,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 89.0, 79.0, 127.0, 97.0, 73.0, 84.0, 88.0, 82.0, 104.0]
2025-05-11 07:11:59,541 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 40 minutes, 36 seconds)
2025-05-11 07:16:20,344 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:16:22,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 522.74792 ± 42.518
2025-05-11 07:16:22,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [483.19803, 547.3722, 488.89383, 507.9717, 482.91473, 566.6619, 563.94244, 607.08905, 493.91525, 485.5199]
2025-05-11 07:16:22,672 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 98.0, 88.0, 93.0, 86.0, 108.0, 100.0, 113.0, 89.0, 87.0]
2025-05-11 07:16:22,682 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 35 minutes, 23 seconds)
2025-05-11 07:20:44,288 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:20:46,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 524.85120 ± 52.156
2025-05-11 07:20:46,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [640.29175, 480.91962, 514.4473, 505.8094, 539.98486, 600.71106, 499.579, 506.9262, 495.64487, 464.19788]
2025-05-11 07:20:46,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 89.0, 91.0, 92.0, 96.0, 114.0, 88.0, 90.0, 89.0, 85.0]
2025-05-11 07:20:46,619 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 30 minutes, 2 seconds)
2025-05-11 07:25:10,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:25:12,687 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 533.48340 ± 50.131
2025-05-11 07:25:12,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [571.2768, 606.7709, 576.9089, 461.71674, 517.68115, 597.7146, 519.049, 482.90698, 469.05838, 531.7501]
2025-05-11 07:25:12,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 114.0, 107.0, 97.0, 93.0, 110.0, 91.0, 87.0, 83.0, 95.0]
2025-05-11 07:25:12,698 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 26 minutes, 48 seconds)
2025-05-11 07:29:33,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:29:35,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 547.42273 ± 57.237
2025-05-11 07:29:35,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [567.92816, 554.0082, 510.36554, 648.6642, 607.2794, 570.61017, 544.78076, 427.69733, 542.2149, 500.67856]
2025-05-11 07:29:35,913 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 102.0, 93.0, 124.0, 119.0, 118.0, 99.0, 75.0, 97.0, 91.0]
2025-05-11 07:29:35,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 22 minutes, 48 seconds)
2025-05-11 07:33:51,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:33:53,745 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 552.96222 ± 54.919
2025-05-11 07:33:53,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [518.14795, 539.351, 568.01294, 553.52014, 623.7816, 595.9988, 509.3179, 471.0091, 497.60168, 652.8809]
2025-05-11 07:33:53,746 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 100.0, 102.0, 100.0, 114.0, 111.0, 91.0, 87.0, 93.0, 122.0]
2025-05-11 07:33:53,755 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 17 minutes, 7 seconds)
2025-05-11 07:38:06,888 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:38:10,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 740.42285 ± 202.074
2025-05-11 07:38:10,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [498.69144, 525.2391, 735.85834, 593.6766, 993.8297, 582.9466, 1040.5251, 569.6004, 917.12537, 946.7358]
2025-05-11 07:38:10,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 108.0, 136.0, 125.0, 192.0, 121.0, 201.0, 105.0, 193.0, 198.0]
2025-05-11 07:38:10,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (740.42) for latency MM1Queue_a033_s075
2025-05-11 07:38:10,477 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 07:38:10,481 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 07:38:10,498 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 11 minutes, 48 seconds)
2025-05-11 07:42:27,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:42:30,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 619.39929 ± 208.346
2025-05-11 07:42:30,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [604.64545, 579.0003, 1232.2129, 500.71893, 489.2866, 618.94135, 531.23926, 540.3296, 577.84424, 519.77405]
2025-05-11 07:42:30,010 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 112.0, 236.0, 94.0, 90.0, 116.0, 96.0, 101.0, 106.0, 98.0]
2025-05-11 07:42:30,040 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 6 minutes, 49 seconds)
2025-05-11 07:46:46,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:46:50,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 671.71765 ± 192.643
2025-05-11 07:46:50,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [515.8927, 680.30023, 746.85443, 623.0392, 597.26715, 1158.2866, 558.9736, 515.3152, 832.4276, 488.81967]
2025-05-11 07:46:50,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 138.0, 137.0, 120.0, 112.0, 228.0, 103.0, 94.0, 158.0, 90.0]
2025-05-11 07:46:50,043 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 1 minute, 37 seconds)
2025-05-11 07:51:03,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:51:07,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 672.29413 ± 95.137
2025-05-11 07:51:07,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [720.168, 642.8432, 555.40546, 669.13904, 635.5867, 778.2917, 763.1851, 746.83655, 744.34125, 467.1439]
2025-05-11 07:51:07,088 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 122.0, 105.0, 130.0, 126.0, 153.0, 148.0, 154.0, 151.0, 99.0]
2025-05-11 07:51:07,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 56 minutes, 27 seconds)
2025-05-11 07:55:26,317 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:55:29,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 728.30743 ± 202.129
2025-05-11 07:55:29,977 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [638.6312, 775.62756, 599.31384, 732.6879, 1036.265, 483.4061, 539.0053, 520.752, 872.60706, 1084.7781]
2025-05-11 07:55:29,978 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 152.0, 113.0, 149.0, 200.0, 90.0, 97.0, 96.0, 173.0, 211.0]
2025-05-11 07:55:29,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 52 minutes, 49 seconds)
2025-05-11 07:59:54,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 07:59:57,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 592.67712 ± 101.387
2025-05-11 07:59:57,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [609.42944, 749.2997, 775.70435, 446.129, 558.52734, 609.72253, 511.08527, 510.81598, 640.32324, 515.73376]
2025-05-11 07:59:57,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 144.0, 148.0, 93.0, 104.0, 124.0, 96.0, 108.0, 125.0, 98.0]
2025-05-11 07:59:57,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 49 minutes, 54 seconds)
2025-05-11 08:04:19,253 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:04:22,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 738.81787 ± 243.212
2025-05-11 08:04:22,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [877.4091, 584.94183, 660.2099, 557.4805, 545.86255, 1048.957, 580.64075, 702.86664, 1294.2651, 535.5452]
2025-05-11 08:04:22,925 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 118.0, 121.0, 105.0, 98.0, 218.0, 113.0, 133.0, 250.0, 96.0]
2025-05-11 08:04:22,936 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 46 minutes, 18 seconds)
2025-05-11 08:08:47,068 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:08:50,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 648.10571 ± 119.790
2025-05-11 08:08:50,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [596.32263, 539.6595, 574.96136, 546.48083, 676.3051, 834.7331, 628.50916, 524.0476, 896.5866, 663.4512]
2025-05-11 08:08:50,396 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 110.0, 117.0, 101.0, 125.0, 171.0, 118.0, 105.0, 173.0, 136.0]
2025-05-11 08:08:50,407 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 42 minutes, 50 seconds)
2025-05-11 08:13:15,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:13:18,796 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 595.95337 ± 111.986
2025-05-11 08:13:18,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [557.39355, 651.6746, 407.25873, 557.5192, 831.5837, 600.2238, 727.88055, 565.3216, 505.43524, 555.2426]
2025-05-11 08:13:18,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 132.0, 82.0, 112.0, 160.0, 114.0, 138.0, 112.0, 91.0, 108.0]
2025-05-11 08:13:18,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 39 minutes, 48 seconds)
2025-05-11 08:17:43,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:17:46,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 712.28595 ± 148.202
2025-05-11 08:17:46,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [518.7645, 625.2914, 682.5181, 645.809, 614.1723, 678.79315, 720.9074, 682.9952, 882.74725, 1070.8612]
2025-05-11 08:17:46,696 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 118.0, 126.0, 128.0, 114.0, 129.0, 139.0, 139.0, 173.0, 202.0]
2025-05-11 08:17:46,709 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 35 minutes, 57 seconds)
2025-05-11 08:22:10,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:22:13,690 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 637.61426 ± 187.359
2025-05-11 08:22:13,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [630.4947, 690.6348, 586.6487, 434.0035, 451.26617, 622.0914, 564.4157, 515.4457, 759.8772, 1121.2646]
2025-05-11 08:22:13,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 133.0, 107.0, 86.0, 81.0, 117.0, 103.0, 100.0, 141.0, 232.0]
2025-05-11 08:22:13,725 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 31 minutes, 26 seconds)
2025-05-11 08:26:39,343 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:26:42,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 673.55603 ± 145.569
2025-05-11 08:26:42,608 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [512.9228, 878.60394, 681.47876, 794.0751, 431.61533, 757.22363, 613.7836, 746.51636, 824.97345, 494.36658]
2025-05-11 08:26:42,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 173.0, 131.0, 153.0, 79.0, 150.0, 113.0, 155.0, 155.0, 89.0]
2025-05-11 08:26:42,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 27 minutes, 21 seconds)
2025-05-11 08:31:15,422 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:31:18,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 667.61639 ± 139.078
2025-05-11 08:31:18,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [507.07825, 846.4673, 635.154, 981.049, 654.4759, 593.28723, 589.7127, 519.541, 710.6114, 638.78705]
2025-05-11 08:31:18,842 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 172.0, 126.0, 201.0, 120.0, 110.0, 117.0, 96.0, 132.0, 121.0]
2025-05-11 08:31:18,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 23 minutes, 50 seconds)
2025-05-11 08:35:49,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:35:52,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 649.51324 ± 79.729
2025-05-11 08:35:52,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [565.1342, 620.6986, 591.6455, 579.34717, 652.3314, 625.25995, 754.46484, 836.8798, 624.98785, 644.38293]
2025-05-11 08:35:52,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 115.0, 115.0, 115.0, 119.0, 115.0, 148.0, 158.0, 113.0, 116.0]
2025-05-11 08:35:52,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 19 minutes, 50 seconds)
2025-05-11 08:40:15,174 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:40:18,050 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 597.51080 ± 113.470
2025-05-11 08:40:18,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [822.1691, 560.3367, 557.841, 522.8463, 384.5271, 559.46545, 538.62164, 693.8197, 654.80695, 680.67365]
2025-05-11 08:40:18,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 106.0, 107.0, 113.0, 69.0, 104.0, 99.0, 133.0, 121.0, 127.0]
2025-05-11 08:40:18,063 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 15 minutes, 8 seconds)
2025-05-11 08:44:42,948 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:44:45,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 547.11194 ± 81.454
2025-05-11 08:44:45,400 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [568.4787, 581.385, 356.4383, 540.4796, 684.4298, 524.615, 580.1434, 600.10956, 558.15234, 476.8873]
2025-05-11 08:44:45,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 107.0, 64.0, 102.0, 128.0, 94.0, 106.0, 110.0, 103.0, 85.0]
2025-05-11 08:44:45,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 10 minutes, 39 seconds)
2025-05-11 08:49:07,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:49:10,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 644.23920 ± 178.023
2025-05-11 08:49:10,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [635.1713, 693.713, 548.7547, 594.1971, 541.1555, 929.97516, 572.9807, 403.54022, 1006.4098, 516.4953]
2025-05-11 08:49:10,535 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 128.0, 100.0, 109.0, 100.0, 176.0, 104.0, 74.0, 188.0, 92.0]
2025-05-11 08:49:10,547 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 5 minutes, 48 seconds)
2025-05-11 08:53:32,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:53:35,207 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 600.44934 ± 127.534
2025-05-11 08:53:35,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [801.83496, 562.3369, 579.8709, 770.2601, 690.21716, 428.61206, 388.2716, 666.2137, 536.97327, 579.90314]
2025-05-11 08:53:35,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 103.0, 106.0, 145.0, 131.0, 80.0, 69.0, 124.0, 97.0, 106.0]
2025-05-11 08:53:35,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 16 seconds)
2025-05-11 08:57:57,708 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 08:58:00,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 620.21716 ± 107.944
2025-05-11 08:58:00,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [561.3102, 512.1968, 748.5881, 630.2007, 491.13852, 771.453, 560.2432, 522.39844, 798.8112, 605.83167]
2025-05-11 08:58:00,607 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 92.0, 148.0, 116.0, 88.0, 154.0, 101.0, 95.0, 150.0, 111.0]
2025-05-11 08:58:00,620 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 55 minutes, 8 seconds)
2025-05-11 09:02:28,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:02:31,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 671.09637 ± 153.544
2025-05-11 09:02:31,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [535.1434, 939.3461, 548.55396, 546.55865, 941.7404, 699.0589, 600.0278, 494.7899, 738.96454, 666.7805]
2025-05-11 09:02:31,609 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 182.0, 100.0, 99.0, 181.0, 134.0, 110.0, 90.0, 140.0, 121.0]
2025-05-11 09:02:31,624 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 51 minutes, 7 seconds)
2025-05-11 09:06:57,987 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:07:00,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 563.63422 ± 73.541
2025-05-11 09:07:00,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [551.08356, 490.01874, 523.13245, 624.53204, 486.19318, 475.67697, 535.9086, 646.55505, 707.7288, 595.5127]
2025-05-11 09:07:00,561 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 87.0, 96.0, 122.0, 88.0, 85.0, 96.0, 120.0, 130.0, 109.0]
2025-05-11 09:07:00,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 46 minutes, 48 seconds)
2025-05-11 09:11:29,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:11:33,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 728.66412 ± 154.212
2025-05-11 09:11:33,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [723.23285, 589.56793, 753.10944, 584.3749, 687.7237, 784.2738, 570.6213, 977.6494, 592.7798, 1023.3078]
2025-05-11 09:11:33,135 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 113.0, 141.0, 118.0, 128.0, 148.0, 103.0, 183.0, 109.0, 194.0]
2025-05-11 09:11:33,149 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 42 minutes, 55 seconds)
2025-05-11 09:16:00,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:16:04,034 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 682.09735 ± 169.412
2025-05-11 09:16:04,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [593.92896, 1002.304, 774.73517, 588.28394, 519.93884, 985.1994, 655.194, 583.9182, 571.9666, 545.5043]
2025-05-11 09:16:04,035 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 201.0, 144.0, 113.0, 95.0, 195.0, 124.0, 106.0, 105.0, 107.0]
2025-05-11 09:16:04,048 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 38 minutes, 54 seconds)
2025-05-11 09:20:26,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:20:29,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 571.37238 ± 94.089
2025-05-11 09:20:29,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [606.42224, 530.58105, 523.8951, 542.5516, 390.80817, 595.47095, 566.2042, 558.00604, 789.6876, 610.09686]
2025-05-11 09:20:29,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 96.0, 112.0, 99.0, 84.0, 109.0, 102.0, 111.0, 147.0, 114.0]
2025-05-11 09:20:29,198 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 34 minutes, 24 seconds)
2025-05-11 09:24:45,363 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:24:48,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 613.74854 ± 141.914
2025-05-11 09:24:48,084 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [675.9568, 768.8661, 550.0893, 897.22894, 431.867, 471.03024, 557.56525, 498.32855, 736.4947, 550.05884]
2025-05-11 09:24:48,085 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 146.0, 100.0, 174.0, 77.0, 86.0, 102.0, 89.0, 136.0, 99.0]
2025-05-11 09:24:48,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 29 minutes, 5 seconds)
2025-05-11 09:29:05,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:29:08,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 645.51990 ± 84.617
2025-05-11 09:29:08,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [661.9167, 632.2034, 576.6124, 598.6817, 558.0158, 655.00214, 821.25775, 565.9628, 778.7586, 606.78705]
2025-05-11 09:29:08,947 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 115.0, 107.0, 109.0, 100.0, 127.0, 165.0, 102.0, 149.0, 110.0]
2025-05-11 09:29:08,962 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 24 minutes, 7 seconds)
2025-05-11 09:33:31,622 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:33:34,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 642.98828 ± 97.366
2025-05-11 09:33:34,658 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [540.6673, 782.3767, 548.8609, 544.0441, 777.76324, 689.8509, 739.40643, 622.1284, 666.6944, 518.0909]
2025-05-11 09:33:34,659 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 144.0, 99.0, 101.0, 162.0, 138.0, 138.0, 118.0, 122.0, 93.0]
2025-05-11 09:33:34,674 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 19 minutes, 17 seconds)
2025-05-11 09:37:56,083 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:37:59,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 634.56244 ± 164.321
2025-05-11 09:37:59,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [556.79956, 544.99084, 601.60126, 1085.3076, 575.5169, 577.1626, 540.85126, 617.3578, 756.36285, 489.6737]
2025-05-11 09:37:59,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 99.0, 110.0, 213.0, 117.0, 105.0, 97.0, 115.0, 141.0, 94.0]
2025-05-11 09:37:59,185 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 14 minutes, 31 seconds)
2025-05-11 09:42:40,773 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:42:43,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 628.81628 ± 131.262
2025-05-11 09:42:43,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [719.7403, 619.60046, 546.7916, 452.82074, 622.3063, 881.97046, 613.15607, 587.2804, 795.5337, 448.9631]
2025-05-11 09:42:43,838 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 119.0, 107.0, 96.0, 120.0, 167.0, 111.0, 115.0, 155.0, 82.0]
2025-05-11 09:42:43,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 11 minutes, 10 seconds)
2025-05-11 09:47:12,988 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:47:16,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 633.94092 ± 159.854
2025-05-11 09:47:16,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [612.6732, 560.45135, 497.95438, 453.07492, 639.1473, 562.3642, 975.2357, 561.67456, 896.19794, 580.6358]
2025-05-11 09:47:16,064 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 100.0, 90.0, 97.0, 134.0, 101.0, 183.0, 101.0, 176.0, 106.0]
2025-05-11 09:47:16,080 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 7 minutes, 23 seconds)
2025-05-11 09:51:36,163 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:51:38,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 551.40179 ± 55.752
2025-05-11 09:51:38,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [587.66034, 596.53595, 524.9878, 566.9594, 579.126, 640.63715, 584.9377, 463.8438, 498.10916, 471.22012]
2025-05-11 09:51:38,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 108.0, 95.0, 102.0, 112.0, 126.0, 108.0, 98.0, 105.0, 84.0]
2025-05-11 09:51:38,782 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 2 minutes, 59 seconds)
2025-05-11 09:56:02,594 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 09:56:05,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 648.50452 ± 187.087
2025-05-11 09:56:05,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [515.0821, 679.78796, 964.56995, 556.8104, 410.2144, 1001.7191, 561.4073, 502.15045, 567.98895, 725.3143]
2025-05-11 09:56:05,704 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 124.0, 197.0, 112.0, 73.0, 205.0, 100.0, 89.0, 102.0, 136.0]
2025-05-11 09:56:05,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 58 minutes, 32 seconds)
2025-05-11 10:00:30,105 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:00:32,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 586.44073 ± 73.085
2025-05-11 10:00:32,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [616.14124, 539.7109, 554.3365, 520.92865, 769.8397, 552.7629, 576.50037, 532.85034, 659.32666, 542.0103]
2025-05-11 10:00:32,716 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 96.0, 99.0, 94.0, 141.0, 99.0, 103.0, 96.0, 120.0, 98.0]
2025-05-11 10:00:32,732 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 54 minutes, 8 seconds)
2025-05-11 10:04:54,767 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:04:58,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 665.49622 ± 186.015
2025-05-11 10:04:58,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [434.20917, 560.66516, 1115.913, 564.44354, 823.7027, 641.5488, 642.70026, 571.7214, 778.5829, 521.47546]
2025-05-11 10:04:58,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 103.0, 223.0, 102.0, 157.0, 117.0, 117.0, 103.0, 160.0, 93.0]
2025-05-11 10:04:58,019 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 48 minutes, 55 seconds)
2025-05-11 10:09:21,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:09:24,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 660.20782 ± 211.111
2025-05-11 10:09:24,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [611.432, 468.25336, 964.63275, 602.2073, 760.01, 1092.5854, 425.1269, 584.61273, 431.02997, 662.1878]
2025-05-11 10:09:24,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 84.0, 180.0, 121.0, 144.0, 205.0, 77.0, 113.0, 80.0, 131.0]
2025-05-11 10:09:24,529 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 44 minutes, 16 seconds)
2025-05-11 10:13:40,296 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:13:43,273 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 661.19177 ± 118.274
2025-05-11 10:13:43,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [774.8513, 435.29395, 708.4161, 680.5137, 585.07697, 903.98395, 605.97217, 662.52014, 668.2248, 587.0648]
2025-05-11 10:13:43,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 79.0, 132.0, 145.0, 116.0, 175.0, 117.0, 118.0, 122.0, 106.0]
2025-05-11 10:13:43,289 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 39 minutes, 44 seconds)
2025-05-11 10:18:09,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:18:11,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 596.40247 ± 103.736
2025-05-11 10:18:11,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [572.0013, 544.9654, 603.77673, 671.866, 823.84607, 540.3504, 442.30807, 503.77798, 697.9871, 563.1453]
2025-05-11 10:18:11,928 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 97.0, 108.0, 138.0, 155.0, 96.0, 79.0, 90.0, 125.0, 101.0]
2025-05-11 10:18:11,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 35 minutes, 21 seconds)
2025-05-11 10:22:32,635 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:22:35,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 661.45123 ± 127.329
2025-05-11 10:22:35,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [515.6928, 720.61475, 665.8935, 525.56525, 610.6659, 929.1871, 826.4589, 673.47833, 603.3471, 543.6085]
2025-05-11 10:22:35,616 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 132.0, 118.0, 96.0, 108.0, 173.0, 149.0, 122.0, 112.0, 97.0]
2025-05-11 10:22:35,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 52 seconds)
2025-05-11 10:27:01,852 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:27:04,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 688.56866 ± 93.068
2025-05-11 10:27:04,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [821.54767, 532.5268, 585.7969, 577.9429, 730.2738, 656.6486, 747.97546, 693.48425, 735.8797, 803.6102]
2025-05-11 10:27:04,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 95.0, 104.0, 102.0, 129.0, 124.0, 134.0, 124.0, 136.0, 144.0]
2025-05-11 10:27:04,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 26 minutes, 32 seconds)
2025-05-11 10:31:23,787 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:31:26,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 691.81702 ± 160.857
2025-05-11 10:31:26,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [596.946, 835.6268, 664.1755, 1092.6603, 646.76416, 637.0855, 750.9894, 499.84845, 643.32184, 550.75287]
2025-05-11 10:31:26,934 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 155.0, 123.0, 218.0, 132.0, 116.0, 150.0, 91.0, 121.0, 99.0]
2025-05-11 10:31:26,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 2 seconds)
2025-05-11 10:35:55,219 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:35:58,675 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 720.22632 ± 130.598
2025-05-11 10:35:58,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [821.1934, 751.20264, 652.89014, 908.40436, 651.99255, 583.18195, 616.9295, 706.7509, 958.29407, 551.42395]
2025-05-11 10:35:58,676 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 155.0, 131.0, 188.0, 133.0, 106.0, 124.0, 127.0, 183.0, 99.0]
2025-05-11 10:35:58,691 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 48 seconds)
2025-05-11 10:40:46,914 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:40:51,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 857.58234 ± 220.186
2025-05-11 10:40:51,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1026.149, 746.33716, 1229.5525, 601.6742, 988.7926, 682.13495, 643.6059, 671.34827, 799.50104, 1186.7275]
2025-05-11 10:40:51,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 138.0, 233.0, 124.0, 201.0, 127.0, 117.0, 123.0, 146.0, 232.0]
2025-05-11 10:40:51,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1226 [INFO]: New best (857.58) for latency MM1Queue_a033_s075
2025-05-11 10:40:51,304 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1229 [INFO]: saving network
2025-05-11 10:40:51,309 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-sac-aug-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 10:40:51,335 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 35 seconds)
2025-05-11 10:45:24,991 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:45:28,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 679.14124 ± 157.796
2025-05-11 10:45:28,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [656.4195, 789.43115, 643.59, 1018.7151, 740.5747, 644.157, 664.9621, 700.8829, 574.7387, 357.94083]
2025-05-11 10:45:28,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 147.0, 129.0, 208.0, 151.0, 119.0, 131.0, 142.0, 104.0, 66.0]
2025-05-11 10:45:28,451 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 9 seconds)
2025-05-11 10:49:51,827 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:49:55,591 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 749.04517 ± 126.724
2025-05-11 10:49:55,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [662.3366, 752.5504, 908.1719, 580.3207, 911.4196, 893.4687, 717.7854, 560.4866, 839.9814, 663.9307]
2025-05-11 10:49:55,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 141.0, 174.0, 107.0, 168.0, 168.0, 134.0, 113.0, 156.0, 135.0]
2025-05-11 10:49:55,611 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 34 seconds)
2025-05-11 10:54:20,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:54:24,053 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1221 [DEBUG]: Total Reward: 830.59735 ± 155.335
2025-05-11 10:54:24,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1222 [DEBUG]: All rewards: [1101.7463, 1126.7693, 705.9894, 685.7145, 739.98596, 730.9457, 904.6639, 694.079, 796.824, 819.2556]
2025-05-11 10:54:24,054 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 221.0, 138.0, 127.0, 136.0, 139.0, 174.0, 126.0, 146.0, 150.0]
2025-05-11 10:54:24,071 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-humanoid):1251 [DEBUG]: Training session finished
