2025-05-08 09:15:36,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4
2025-05-08 09:15:36,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4
2025-05-08 09:15:36,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7baa783c3f10>}
2025-05-08 09:15:36,560 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1009 [DEBUG]: using device: cpu
2025-05-08 09:15:36,560 baseline-sac-noisy-walker2d:77 [WARNING]: args.memorize_actions != args.horizon: 4 != 32
2025-05-08 09:15:36,568 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1031 [INFO]: Creating new trainer
2025-05-08 09:15:36,577 baseline-sac-noisy-walker2d:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-08 09:15:36,577 baseline-sac-noisy-walker2d:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=47, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-08 09:15:36,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1092 [DEBUG]: Starting training session...
2025-05-08 09:15:36,771 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 1/100
2025-05-08 09:18:05,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:18:06,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 9.81781 ± 9.838
2025-05-08 09:18:06,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [18.279057, 13.147615, 22.222622, 20.131966, 9.198876, 12.66293, -13.006291, 1.0650632, 5.5671086, 8.909175]
2025-05-08 09:18:06,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [86.0, 85.0, 86.0, 87.0, 90.0, 86.0, 73.0, 71.0, 79.0, 76.0]
2025-05-08 09:18:06,001 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (9.82) for latency ExtremeSparseL4U32
2025-05-08 09:18:06,002 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:18:06,006 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:18:06,011 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 6 minutes, 14 seconds)
2025-05-08 09:20:49,323 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:20:50,793 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 37.45040 ± 62.260
2025-05-08 09:20:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [-45.438602, -6.65291, 169.27979, -24.1511, -3.141073, 73.86929, 21.423134, 41.579594, 35.965267, 111.77059]
2025-05-08 09:20:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [210.0, 112.0, 201.0, 100.0, 98.0, 143.0, 71.0, 118.0, 124.0, 122.0]
2025-05-08 09:20:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (37.45) for latency ExtremeSparseL4U32
2025-05-08 09:20:50,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:20:50,798 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:20:50,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 16 minutes, 27 seconds)
2025-05-08 09:23:35,160 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:23:36,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 197.71751 ± 115.595
2025-05-08 09:23:36,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [250.2012, 356.49097, 208.15462, 328.9091, 289.28287, 22.954159, 39.080963, 273.95178, 125.24179, 82.90777]
2025-05-08 09:23:36,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [132.0, 228.0, 116.0, 218.0, 171.0, 39.0, 153.0, 154.0, 126.0, 79.0]
2025-05-08 09:23:36,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (197.72) for latency ExtremeSparseL4U32
2025-05-08 09:23:36,846 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:23:36,850 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:23:36,856 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 18 minutes, 42 seconds)
2025-05-08 09:26:23,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:26:25,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 223.06433 ± 127.564
2025-05-08 09:26:25,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [276.33594, 75.658554, 262.88467, 25.793781, 291.58688, 384.04214, 254.07112, 269.60342, 18.720345, 371.9465]
2025-05-08 09:26:25,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [153.0, 154.0, 133.0, 43.0, 165.0, 244.0, 127.0, 148.0, 27.0, 236.0]
2025-05-08 09:26:25,455 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (223.06) for latency ExtremeSparseL4U32
2025-05-08 09:26:25,456 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:26:25,459 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:26:25,466 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 19 minutes, 28 seconds)
2025-05-08 09:29:16,524 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:29:18,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 313.63507 ± 94.160
2025-05-08 09:29:18,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [325.25732, 377.48737, 269.7291, 303.65317, 292.78293, 324.3994, 329.7873, 113.01749, 515.0195, 285.21667]
2025-05-08 09:29:18,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [197.0, 216.0, 147.0, 168.0, 168.0, 197.0, 193.0, 163.0, 381.0, 154.0]
2025-05-08 09:29:18,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (313.64) for latency ExtremeSparseL4U32
2025-05-08 09:29:18,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:29:18,949 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:29:18,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 20 minutes, 21 seconds)
2025-05-08 09:32:08,967 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:32:10,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 232.24454 ± 177.057
2025-05-08 09:32:10,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [382.66238, 277.34497, 127.99006, 545.55237, 335.58948, 420.2691, 48.006153, 30.68818, 151.67879, 2.6637192]
2025-05-08 09:32:10,922 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [192.0, 225.0, 187.0, 338.0, 163.0, 197.0, 72.0, 46.0, 238.0, 18.0]
2025-05-08 09:32:10,923 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 24 minutes, 44 seconds)
2025-05-08 09:34:42,744 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:34:44,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 215.76485 ± 119.732
2025-05-08 09:34:44,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [166.49039, 49.7148, 299.53275, 241.71185, 105.29471, 118.121254, 86.34537, 396.93497, 342.23276, 351.26956]
2025-05-08 09:34:44,764 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [228.0, 120.0, 173.0, 178.0, 69.0, 169.0, 156.0, 232.0, 190.0, 200.0]
2025-05-08 09:34:44,766 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 18 minutes, 31 seconds)
2025-05-08 09:37:31,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:37:33,640 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 245.79460 ± 179.691
2025-05-08 09:37:33,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [67.31152, 48.7602, 443.3249, 509.34818, 243.3631, 497.56964, 359.02573, 108.221535, 103.96406, 77.0571]
2025-05-08 09:37:33,641 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [123.0, 106.0, 229.0, 300.0, 321.0, 278.0, 232.0, 164.0, 155.0, 131.0]
2025-05-08 09:37:33,643 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 16 minutes, 36 seconds)
2025-05-08 09:40:16,139 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:40:18,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 330.08221 ± 154.205
2025-05-08 09:40:18,790 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [338.67624, 111.53174, 528.43665, 381.55453, 365.7804, 523.9043, 413.5409, 412.34067, 93.462555, 131.59404]
2025-05-08 09:40:18,791 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [172.0, 166.0, 259.0, 197.0, 198.0, 335.0, 252.0, 228.0, 207.0, 198.0]
2025-05-08 09:40:18,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (330.08) for latency ExtremeSparseL4U32
2025-05-08 09:40:18,792 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:40:18,796 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:40:18,803 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 12 minutes, 46 seconds)
2025-05-08 09:43:04,710 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:43:07,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 317.00107 ± 122.107
2025-05-08 09:43:07,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [336.5821, 140.42561, 404.77597, 414.0575, 401.04837, 182.19812, 126.639534, 498.48697, 385.33206, 280.46445]
2025-05-08 09:43:07,511 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [213.0, 107.0, 220.0, 246.0, 237.0, 251.0, 178.0, 265.0, 256.0, 350.0]
2025-05-08 09:43:07,514 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 8 minutes, 34 seconds)
2025-05-08 09:45:52,274 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:45:54,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 249.76733 ± 111.921
2025-05-08 09:45:54,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [122.57931, 169.37242, 445.5768, 308.22934, 101.17364, 219.10326, 375.5687, 149.35674, 247.72003, 358.99326]
2025-05-08 09:45:54,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [184.0, 250.0, 280.0, 160.0, 80.0, 112.0, 224.0, 221.0, 122.0, 196.0]
2025-05-08 09:45:54,452 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 4 minutes, 18 seconds)
2025-05-08 09:48:41,303 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:48:43,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 398.19342 ± 206.349
2025-05-08 09:48:43,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [27.819004, 28.378668, 296.70938, 426.71094, 483.61957, 581.42, 506.71143, 415.10696, 606.15405, 609.30444]
2025-05-08 09:48:43,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [44.0, 46.0, 176.0, 191.0, 232.0, 289.0, 240.0, 199.0, 287.0, 272.0]
2025-05-08 09:48:43,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (398.19) for latency ExtremeSparseL4U32
2025-05-08 09:48:43,668 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:48:43,673 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:48:43,681 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 6 minutes, 4 seconds)
2025-05-08 09:51:27,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:51:30,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 356.60666 ± 188.703
2025-05-08 09:51:30,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [440.76434, 131.17924, 306.98105, 30.913895, 681.9883, 452.52972, 267.53525, 479.4169, 229.79457, 544.9632]
2025-05-08 09:51:30,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [224.0, 197.0, 179.0, 57.0, 535.0, 224.0, 148.0, 244.0, 340.0, 265.0]
2025-05-08 09:51:30,781 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 2 minutes, 46 seconds)
2025-05-08 09:54:14,599 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:54:17,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 487.09479 ± 148.590
2025-05-08 09:54:17,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [290.335, 595.6774, 603.0774, 499.94113, 385.6406, 494.51282, 514.2806, 744.5337, 533.6013, 209.34695]
2025-05-08 09:54:17,433 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [155.0, 272.0, 277.0, 221.0, 183.0, 231.0, 255.0, 337.0, 305.0, 149.0]
2025-05-08 09:54:17,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (487.09) for latency ExtremeSparseL4U32
2025-05-08 09:54:17,434 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 09:54:17,438 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 09:54:17,445 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 24 seconds)
2025-05-08 09:57:01,412 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:57:04,113 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 376.63565 ± 168.068
2025-05-08 09:57:04,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [438.99918, 78.42262, 521.3838, 485.5041, 272.1607, 396.43195, 482.25516, 562.34906, 71.15227, 457.69742]
2025-05-08 09:57:04,114 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [220.0, 115.0, 268.0, 259.0, 240.0, 205.0, 271.0, 307.0, 118.0, 236.0]
2025-05-08 09:57:04,116 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 57 minutes, 2 seconds)
2025-05-08 09:59:52,719 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 09:59:55,822 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 361.85962 ± 202.777
2025-05-08 09:59:55,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [330.83136, 13.208245, 675.0805, 351.5696, 353.15594, 612.3925, 181.26434, 500.9572, 115.66449, 484.47205]
2025-05-08 09:59:55,823 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [205.0, 26.0, 306.0, 169.0, 230.0, 309.0, 135.0, 242.0, 164.0, 298.0]
2025-05-08 09:59:55,826 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 55 minutes, 35 seconds)
2025-05-08 10:03:20,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:03:23,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 407.54013 ± 312.349
2025-05-08 10:03:23,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [40.62723, 20.863815, 294.83582, 905.6621, 509.82132, 817.90656, 424.4713, 715.8766, 20.845974, 324.49075]
2025-05-08 10:03:23,499 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [59.0, 32.0, 164.0, 339.0, 279.0, 389.0, 256.0, 357.0, 36.0, 192.0]
2025-05-08 10:03:23,503 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 3 minutes, 25 seconds)
2025-05-08 10:06:13,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:06:17,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 451.14307 ± 299.238
2025-05-08 10:06:17,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [276.28763, 311.9785, 40.012505, 413.8398, 1236.5902, 609.96173, 403.4296, 419.47107, 510.3811, 289.4784]
2025-05-08 10:06:17,028 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [171.0, 189.0, 56.0, 323.0, 549.0, 458.0, 239.0, 235.0, 368.0, 167.0]
2025-05-08 10:06:17,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 2 minutes, 14 seconds)
2025-05-08 10:09:02,324 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:09:05,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 667.05676 ± 191.783
2025-05-08 10:09:05,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [552.3707, 475.88156, 683.8937, 692.6285, 801.1842, 345.99026, 852.4649, 482.75897, 775.88434, 1007.5102]
2025-05-08 10:09:05,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [256.0, 202.0, 297.0, 299.0, 359.0, 213.0, 341.0, 265.0, 359.0, 415.0]
2025-05-08 10:09:05,980 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (667.06) for latency ExtremeSparseL4U32
2025-05-08 10:09:05,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:09:05,986 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:09:05,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 59 minutes, 54 seconds)
2025-05-08 10:11:52,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:11:55,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 419.99408 ± 326.699
2025-05-08 10:11:55,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [497.41022, 85.68058, 36.548794, 429.1651, 41.553764, 30.436964, 673.77673, 745.7134, 852.7916, 806.8637]
2025-05-08 10:11:55,252 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [253.0, 135.0, 68.0, 221.0, 64.0, 51.0, 364.0, 333.0, 482.0, 378.0]
2025-05-08 10:11:55,255 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 57 minutes, 38 seconds)
2025-05-08 10:15:00,930 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:15:06,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 758.05450 ± 316.561
2025-05-08 10:15:06,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [691.7508, 559.4355, 956.14575, 338.97607, 1411.1974, 828.2599, 725.148, 979.0615, 252.53665, 838.0338]
2025-05-08 10:15:06,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [309.0, 236.0, 380.0, 169.0, 647.0, 329.0, 300.0, 379.0, 317.0, 320.0]
2025-05-08 10:15:06,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (758.05) for latency ExtremeSparseL4U32
2025-05-08 10:15:06,228 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:15:06,233 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:15:06,244 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 59 minutes, 44 seconds)
2025-05-08 10:18:12,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:18:17,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 829.26874 ± 416.634
2025-05-08 10:18:17,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1537.8744, 796.4932, 886.7438, 1009.47266, 318.62506, 519.3805, 663.4408, 311.3218, 1554.6627, 694.6729]
2025-05-08 10:18:17,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [596.0, 349.0, 383.0, 426.0, 198.0, 257.0, 342.0, 187.0, 685.0, 327.0]
2025-05-08 10:18:17,221 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (829.27) for latency ExtremeSparseL4U32
2025-05-08 10:18:17,222 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:18:17,226 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:18:17,234 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 52 minutes, 22 seconds)
2025-05-08 10:21:05,527 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:21:11,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 760.63556 ± 323.866
2025-05-08 10:21:11,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [601.02454, 1526.8282, 381.5089, 753.1196, 818.77875, 836.37256, 817.34344, 843.0339, 238.52524, 789.81995]
2025-05-08 10:21:11,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [266.0, 595.0, 266.0, 333.0, 353.0, 411.0, 337.0, 374.0, 282.0, 371.0]
2025-05-08 10:21:11,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 49 minutes, 28 seconds)
2025-05-08 10:24:37,030 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:24:41,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 664.49622 ± 392.446
2025-05-08 10:24:41,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1201.4358, 319.27942, 855.54865, 1015.806, 355.24326, 828.2787, 912.576, 993.83044, 96.65894, 66.305176]
2025-05-08 10:24:41,605 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [426.0, 170.0, 342.0, 401.0, 201.0, 303.0, 411.0, 493.0, 145.0, 81.0]
2025-05-08 10:24:41,610 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 57 minutes, 1 second)
2025-05-08 10:28:12,428 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:28:16,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 500.74860 ± 364.378
2025-05-08 10:28:16,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [790.10156, 69.686806, 314.72015, 871.29224, 921.43164, 84.58293, 844.14935, 863.2775, 85.05908, 163.18442]
2025-05-08 10:28:16,656 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [344.0, 110.0, 190.0, 446.0, 385.0, 109.0, 334.0, 439.0, 108.0, 304.0]
2025-05-08 10:28:16,661 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 5 minutes, 21 seconds)
2025-05-08 10:31:35,478 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:31:40,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 678.91217 ± 338.286
2025-05-08 10:31:40,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [468.21423, 869.58997, 235.97943, 1373.8446, 244.73296, 682.90094, 1009.68005, 408.37027, 791.7913, 704.0177]
2025-05-08 10:31:40,845 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [293.0, 447.0, 133.0, 735.0, 198.0, 322.0, 372.0, 207.0, 365.0, 329.0]
2025-05-08 10:31:40,851 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 5 minutes, 20 seconds)
2025-05-08 10:35:03,841 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:35:10,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 820.15686 ± 348.090
2025-05-08 10:35:10,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1289.0802, 179.76233, 800.7065, 1087.2308, 215.82031, 981.3487, 994.5858, 721.1215, 831.7702, 1100.1421]
2025-05-08 10:35:10,337 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 233.0, 311.0, 434.0, 151.0, 405.0, 397.0, 295.0, 320.0, 461.0]
2025-05-08 10:35:10,342 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 6 minutes, 31 seconds)
2025-05-08 10:38:28,087 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:38:31,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 429.03271 ± 326.396
2025-05-08 10:38:31,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [181.30219, 598.81415, 829.847, 866.1222, 945.00525, 193.99847, 111.07964, 311.49103, 82.65118, 170.01613]
2025-05-08 10:38:31,683 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [228.0, 319.0, 353.0, 402.0, 339.0, 235.0, 154.0, 424.0, 130.0, 201.0]
2025-05-08 10:38:31,688 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 9 minutes, 44 seconds)
2025-05-08 10:41:49,887 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:41:53,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 389.95679 ± 301.297
2025-05-08 10:41:53,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [23.919535, 615.023, 761.7716, 627.78564, 704.0605, 724.5041, 172.06615, 82.21629, 77.73027, 110.49085]
2025-05-08 10:41:53,780 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [35.0, 353.0, 323.0, 350.0, 449.0, 331.0, 271.0, 126.0, 121.0, 172.0]
2025-05-08 10:41:53,785 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 4 minutes, 16 seconds)
2025-05-08 10:45:18,171 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:45:22,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 705.52673 ± 262.885
2025-05-08 10:45:22,815 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [669.0133, 307.49167, 742.2396, 679.4723, 495.97766, 1043.7635, 708.83234, 968.5441, 331.7844, 1108.1492]
2025-05-08 10:45:22,816 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [293.0, 166.0, 311.0, 296.0, 225.0, 445.0, 307.0, 401.0, 175.0, 440.0]
2025-05-08 10:45:22,821 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 59 minutes, 26 seconds)
2025-05-08 10:48:49,530 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:48:52,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 438.51813 ± 300.108
2025-05-08 10:48:52,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [16.15438, 1032.7386, 244.28024, 12.947249, 229.01282, 546.0078, 661.58075, 468.53418, 595.4462, 578.47864]
2025-05-08 10:48:52,937 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [26.0, 441.0, 152.0, 24.0, 199.0, 276.0, 318.0, 227.0, 311.0, 289.0]
2025-05-08 10:48:52,943 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 57 minutes, 22 seconds)
2025-05-08 10:52:18,968 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:52:25,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 990.69482 ± 281.115
2025-05-08 10:52:25,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1371.4586, 959.07825, 1627.7063, 706.34814, 851.8608, 774.32715, 790.49255, 770.59534, 1040.0155, 1015.0652]
2025-05-08 10:52:25,940 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [537.0, 420.0, 726.0, 308.0, 535.0, 329.0, 367.0, 312.0, 436.0, 417.0]
2025-05-08 10:52:25,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (990.69) for latency ExtremeSparseL4U32
2025-05-08 10:52:25,941 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 10:52:25,946 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 10:52:25,959 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 54 minutes, 44 seconds)
2025-05-08 10:55:46,633 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:55:52,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 895.46680 ± 415.365
2025-05-08 10:55:52,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [691.3331, 974.27515, 1351.6241, 844.8265, 1051.7806, 1314.5146, 1160.5256, 327.17212, 18.900652, 1219.7145]
2025-05-08 10:55:52,632 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [317.0, 378.0, 536.0, 377.0, 404.0, 564.0, 479.0, 206.0, 34.0, 513.0]
2025-05-08 10:55:52,638 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 52 minutes, 28 seconds)
2025-05-08 10:58:46,437 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 10:58:50,954 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 816.50714 ± 408.037
2025-05-08 10:58:50,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [771.2736, 813.755, 316.23276, 137.17079, 640.9585, 1643.3337, 902.8338, 826.81366, 1295.1481, 817.551]
2025-05-08 10:58:50,955 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [338.0, 345.0, 187.0, 146.0, 333.0, 684.0, 363.0, 342.0, 495.0, 362.0]
2025-05-08 10:58:50,960 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 43 minutes, 46 seconds)
2025-05-08 11:01:35,387 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:01:38,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 597.97180 ± 747.171
2025-05-08 11:01:38,864 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [21.277279, 716.64935, 29.516832, 22.854805, 274.20358, 2449.0105, 1503.2876, 331.87003, 293.74237, 337.3056]
2025-05-08 11:01:38,865 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [41.0, 295.0, 55.0, 48.0, 151.0, 942.0, 577.0, 195.0, 165.0, 196.0]
2025-05-08 11:01:38,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 31 minutes, 28 seconds)
2025-05-08 11:04:24,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:04:28,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 567.76489 ± 478.560
2025-05-08 11:04:28,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [855.5398, 451.13312, 807.7447, 1113.9021, 29.91388, 37.603493, 15.882384, 1370.1174, 104.54458, 891.26764]
2025-05-08 11:04:28,120 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [364.0, 256.0, 352.0, 484.0, 54.0, 58.0, 27.0, 480.0, 217.0, 390.0]
2025-05-08 11:04:28,125 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 19 minutes, 30 seconds)
2025-05-08 11:07:19,660 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:07:24,804 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 830.26367 ± 691.439
2025-05-08 11:07:24,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [149.58902, 391.0389, 34.18582, 690.14185, 2361.187, 1183.3241, 755.2681, 382.79837, 650.3768, 1704.727]
2025-05-08 11:07:24,805 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [192.0, 267.0, 42.0, 255.0, 912.0, 477.0, 326.0, 220.0, 565.0, 655.0]
2025-05-08 11:07:24,809 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 8 minutes, 45 seconds)
2025-05-08 11:10:07,950 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:10:12,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 902.99207 ± 556.685
2025-05-08 11:10:12,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [246.60092, 1133.8864, 358.15485, 759.4643, 855.16724, 1616.6476, 1898.6719, 1190.9523, 59.53547, 910.8397]
2025-05-08 11:10:12,754 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [143.0, 453.0, 199.0, 308.0, 345.0, 637.0, 765.0, 470.0, 79.0, 350.0]
2025-05-08 11:10:12,758 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 57 minutes, 45 seconds)
2025-05-08 11:13:08,867 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:13:12,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 577.00439 ± 360.494
2025-05-08 11:13:12,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1180.7092, 89.55838, 782.50006, 581.6946, 477.82117, 1007.6204, 109.010605, 750.95746, 126.28107, 663.8905]
2025-05-08 11:13:12,094 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [415.0, 127.0, 319.0, 221.0, 220.0, 385.0, 144.0, 353.0, 174.0, 292.0]
2025-05-08 11:13:12,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 55 minutes, 5 seconds)
2025-05-08 11:15:24,715 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:15:28,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 733.82574 ± 424.833
2025-05-08 11:15:28,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [766.1639, 96.03499, 1318.9192, 826.00793, 531.05176, 31.922642, 1268.3389, 557.9515, 1175.0375, 766.8291]
2025-05-08 11:15:28,286 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [330.0, 153.0, 535.0, 364.0, 257.0, 54.0, 547.0, 263.0, 494.0, 304.0]
2025-05-08 11:15:28,291 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 45 minutes, 53 seconds)
2025-05-08 11:17:48,285 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:17:50,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 466.01392 ± 579.296
2025-05-08 11:17:50,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [47.725063, 25.658632, 211.42064, 1998.2911, 654.42737, 786.6169, 603.0773, 30.933002, 29.696762, 272.29242]
2025-05-08 11:17:50,695 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [61.0, 39.0, 168.0, 757.0, 336.0, 352.0, 274.0, 41.0, 53.0, 141.0]
2025-05-08 11:17:50,699 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 37 minutes, 50 seconds)
2025-05-08 11:20:10,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:20:14,177 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 832.10339 ± 506.326
2025-05-08 11:20:14,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [101.25969, 83.39787, 702.7228, 735.02325, 822.7522, 742.17554, 1179.1791, 1282.1755, 1885.4609, 786.8869]
2025-05-08 11:20:14,178 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [157.0, 119.0, 381.0, 311.0, 330.0, 392.0, 492.0, 474.0, 752.0, 346.0]
2025-05-08 11:20:14,182 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 28 minutes, 44 seconds)
2025-05-08 11:22:28,637 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:22:31,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 468.60654 ± 455.984
2025-05-08 11:22:31,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [802.13715, 125.85436, 838.0153, 515.58936, 1452.4805, 168.81567, 712.63086, 27.994614, 25.79357, 16.75449]
2025-05-08 11:22:31,464 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [336.0, 176.0, 483.0, 291.0, 604.0, 230.0, 379.0, 42.0, 38.0, 28.0]
2025-05-08 11:22:31,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 20 minutes, 21 seconds)
2025-05-08 11:24:52,534 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:24:55,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 480.11371 ± 485.760
2025-05-08 11:24:55,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1266.6553, 227.70967, 32.52362, 1059.9188, 22.475391, 49.272324, 19.003428, 486.42865, 1208.4017, 428.74814]
2025-05-08 11:24:55,100 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [587.0, 263.0, 47.0, 418.0, 38.0, 71.0, 33.0, 225.0, 501.0, 200.0]
2025-05-08 11:24:55,104 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 11 minutes, 13 seconds)
2025-05-08 11:27:10,602 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:27:13,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 682.05457 ± 354.408
2025-05-08 11:27:13,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1162.4508, 95.58664, 612.31323, 67.04389, 558.3734, 871.13824, 992.9953, 639.6617, 1059.3612, 761.62134]
2025-05-08 11:27:13,794 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [495.0, 172.0, 278.0, 125.0, 252.0, 313.0, 410.0, 259.0, 416.0, 295.0]
2025-05-08 11:27:13,799 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 9 minutes, 20 seconds)
2025-05-08 11:29:29,874 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:29:34,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 718.74512 ± 692.085
2025-05-08 11:29:34,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [858.6168, 92.94055, 683.4752, 1460.0685, 1830.0792, 144.1121, 98.10522, 1789.8727, 90.09837, 140.08252]
2025-05-08 11:29:34,091 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [352.0, 144.0, 293.0, 586.0, 934.0, 197.0, 140.0, 737.0, 135.0, 194.0]
2025-05-08 11:29:34,096 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 6 minutes, 36 seconds)
2025-05-08 11:31:51,233 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:31:54,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 644.83899 ± 396.482
2025-05-08 11:31:54,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [853.4848, 916.55237, 1134.8251, 370.61832, 21.02999, 888.3948, 238.19524, 799.1837, 113.21512, 1112.8901]
2025-05-08 11:31:54,372 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [358.0, 432.0, 445.0, 201.0, 63.0, 402.0, 126.0, 335.0, 156.0, 441.0]
2025-05-08 11:31:54,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 3 minutes, 42 seconds)
2025-05-08 11:34:13,869 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:34:17,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 804.79291 ± 384.526
2025-05-08 11:34:17,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1051.6964, 737.5254, 1082.3578, 850.92834, 794.98956, 18.180304, 373.0197, 1401.5814, 582.2007, 1155.4497]
2025-05-08 11:34:17,557 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [426.0, 375.0, 451.0, 345.0, 323.0, 30.0, 192.0, 530.0, 311.0, 461.0]
2025-05-08 11:34:17,563 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 2 minutes, 23 seconds)
2025-05-08 11:36:32,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:36:35,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 497.07178 ± 364.748
2025-05-08 11:36:35,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [39.479733, 750.3562, 1002.4675, 368.47046, 425.1398, 973.24365, 469.9989, 24.267258, 55.06136, 862.2331]
2025-05-08 11:36:35,371 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [56.0, 291.0, 427.0, 195.0, 211.0, 370.0, 315.0, 41.0, 91.0, 355.0]
2025-05-08 11:36:35,377 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 59 minutes, 2 seconds)
2025-05-08 11:38:54,538 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:38:58,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 735.48163 ± 420.482
2025-05-08 11:38:58,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [975.91974, 1670.0771, 807.4358, 853.4301, 342.56232, 294.3201, 981.2731, 809.0609, 166.48012, 454.2573]
2025-05-08 11:38:58,106 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [359.0, 664.0, 333.0, 350.0, 220.0, 166.0, 381.0, 353.0, 209.0, 282.0]
2025-05-08 11:38:58,112 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 57 minutes, 23 seconds)
2025-05-08 11:41:16,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:41:20,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 794.07275 ± 197.727
2025-05-08 11:41:20,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [366.9719, 1114.1476, 1075.5581, 684.70996, 798.75995, 835.9425, 770.89734, 814.7688, 795.1651, 683.8059]
2025-05-08 11:41:20,208 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [155.0, 446.0, 395.0, 253.0, 322.0, 358.0, 327.0, 384.0, 315.0, 298.0]
2025-05-08 11:41:20,213 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 55 minutes, 19 seconds)
2025-05-08 11:43:38,169 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:43:42,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 979.99591 ± 485.801
2025-05-08 11:43:42,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [963.3682, 375.2572, 1402.0936, 777.6808, 2211.6294, 732.81055, 700.11993, 1020.80084, 978.1441, 638.0532]
2025-05-08 11:43:42,425 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [358.0, 197.0, 530.0, 312.0, 845.0, 292.0, 296.0, 406.0, 360.0, 301.0]
2025-05-08 11:43:42,430 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 53 minutes, 17 seconds)
2025-05-08 11:46:00,849 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:46:04,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 730.56909 ± 127.426
2025-05-08 11:46:04,092 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [782.776, 764.1767, 805.5497, 521.2278, 932.5625, 580.3592, 842.58496, 809.0264, 569.50824, 697.9199]
2025-05-08 11:46:04,093 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [331.0, 363.0, 332.0, 218.0, 356.0, 266.0, 340.0, 326.0, 279.0, 275.0]
2025-05-08 11:46:04,099 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 50 minutes, 41 seconds)
2025-05-08 11:48:16,972 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:48:20,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 718.48260 ± 310.551
2025-05-08 11:48:20,226 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [199.71501, 1154.9773, 367.23422, 823.9823, 957.2227, 321.40594, 827.52484, 755.94946, 1094.0275, 682.7869]
2025-05-08 11:48:20,227 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [248.0, 434.0, 142.0, 309.0, 342.0, 192.0, 320.0, 321.0, 467.0, 299.0]
2025-05-08 11:48:20,232 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 48 minutes, 4 seconds)
2025-05-08 11:50:41,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:50:43,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 638.00861 ± 407.112
2025-05-08 11:50:43,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [737.9628, 930.8552, 625.65814, 268.6048, 1263.1974, 833.8995, 1116.6377, 560.74664, 21.476973, 21.04639]
2025-05-08 11:50:43,878 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [307.0, 348.0, 280.0, 109.0, 476.0, 353.0, 434.0, 297.0, 40.0, 31.0]
2025-05-08 11:50:43,885 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 45 minutes, 51 seconds)
2025-05-08 11:52:58,247 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:53:01,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 803.26691 ± 202.098
2025-05-08 11:53:01,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [815.36633, 829.7852, 738.5756, 381.48132, 752.4847, 930.603, 1154.492, 666.77124, 1043.5038, 719.6053]
2025-05-08 11:53:01,985 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [345.0, 302.0, 300.0, 161.0, 322.0, 383.0, 472.0, 550.0, 519.0, 297.0]
2025-05-08 11:53:01,990 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 42 minutes, 55 seconds)
2025-05-08 11:55:22,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:55:25,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 723.70459 ± 434.369
2025-05-08 11:55:25,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1219.529, 902.9854, 1279.6477, 421.02988, 44.3959, 45.475426, 747.56494, 916.1571, 493.6949, 1166.5659]
2025-05-08 11:55:25,494 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [437.0, 330.0, 449.0, 297.0, 60.0, 68.0, 284.0, 324.0, 253.0, 549.0]
2025-05-08 11:55:25,500 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 40 minutes, 46 seconds)
2025-05-08 11:57:41,531 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 11:57:43,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 445.81152 ± 413.220
2025-05-08 11:57:43,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [226.85577, 883.333, 39.84843, 43.844055, 38.543922, 898.11426, 399.1145, 1145.2937, 21.46465, 761.7031]
2025-05-08 11:57:43,655 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [155.0, 348.0, 53.0, 55.0, 52.0, 335.0, 206.0, 436.0, 36.0, 353.0]
2025-05-08 11:57:43,662 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 37 minutes, 56 seconds)
2025-05-08 12:00:01,813 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:00:05,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 936.46375 ± 333.002
2025-05-08 12:00:05,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [406.87512, 706.81525, 1125.2559, 1147.086, 942.1271, 1136.3156, 1248.4384, 278.32593, 1183.755, 1189.6432]
2025-05-08 12:00:05,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [213.0, 332.0, 401.0, 408.0, 438.0, 434.0, 500.0, 157.0, 447.0, 405.0]
2025-05-08 12:00:05,850 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 36 minutes, 26 seconds)
2025-05-08 12:02:22,475 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:02:25,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 721.83282 ± 288.094
2025-05-08 12:02:25,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [940.53784, 1033.833, 758.71735, 305.08057, 231.02551, 865.6766, 979.8612, 355.199, 854.8003, 893.59717]
2025-05-08 12:02:25,374 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [314.0, 374.0, 291.0, 184.0, 121.0, 336.0, 356.0, 181.0, 313.0, 314.0]
2025-05-08 12:02:25,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 33 minutes, 31 seconds)
2025-05-08 12:04:46,277 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:04:49,032 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 626.47992 ± 357.419
2025-05-08 12:04:49,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [831.1751, 22.812418, 763.65985, 911.0018, 1132.5544, 1105.309, 297.60635, 300.05444, 382.3773, 518.24884]
2025-05-08 12:04:49,033 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [328.0, 36.0, 261.0, 345.0, 421.0, 422.0, 166.0, 185.0, 234.0, 236.0]
2025-05-08 12:04:49,039 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 31 minutes, 54 seconds)
2025-05-08 12:07:03,059 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:07:06,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 967.90771 ± 312.352
2025-05-08 12:07:06,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [754.4417, 1024.1183, 1136.3402, 1006.8726, 803.6973, 1073.9984, 1563.2301, 306.7175, 1184.4059, 825.25446]
2025-05-08 12:07:06,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [290.0, 352.0, 413.0, 342.0, 301.0, 403.0, 551.0, 160.0, 458.0, 317.0]
2025-05-08 12:07:06,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 28 minutes, 50 seconds)
2025-05-08 12:09:43,110 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:09:48,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1015.21893 ± 148.469
2025-05-08 12:09:48,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [913.414, 1137.6472, 779.62933, 1143.0475, 1201.201, 872.9694, 1123.3411, 1022.9746, 1143.3516, 814.6136]
2025-05-08 12:09:48,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [332.0, 409.0, 297.0, 431.0, 433.0, 339.0, 412.0, 379.0, 458.0, 312.0]
2025-05-08 12:09:48,204 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (1015.22) for latency ExtremeSparseL4U32
2025-05-08 12:09:48,205 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 12:09:48,210 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 12:09:48,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 29 minutes, 21 seconds)
2025-05-08 12:12:43,924 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:12:48,003 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 732.98413 ± 393.949
2025-05-08 12:12:48,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1271.7306, 564.65894, 918.5288, 913.4751, 1066.8615, 810.91797, 33.35876, 946.1983, 786.0228, 18.088243]
2025-05-08 12:12:48,004 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [469.0, 279.0, 415.0, 362.0, 350.0, 338.0, 46.0, 368.0, 393.0, 36.0]
2025-05-08 12:12:48,013 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 31 minutes, 27 seconds)
2025-05-08 12:15:39,580 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:15:43,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 696.91364 ± 411.668
2025-05-08 12:15:43,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1035.3043, 663.62897, 944.4282, 990.7027, 696.9026, 1199.1722, 274.72208, 47.589417, 1078.4296, 38.256252]
2025-05-08 12:15:43,329 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [410.0, 302.0, 359.0, 336.0, 299.0, 439.0, 155.0, 72.0, 411.0, 59.0]
2025-05-08 12:15:43,339 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 33 minutes, 5 seconds)
2025-05-08 12:18:04,270 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:18:07,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 846.26935 ± 495.508
2025-05-08 12:18:07,996 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1471.4323, 959.00946, 1149.8452, 1321.576, 522.1715, 1265.2926, 392.69214, 18.038559, 177.86324, 1184.7715]
2025-05-08 12:18:07,997 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [455.0, 406.0, 408.0, 528.0, 308.0, 452.0, 194.0, 70.0, 231.0, 415.0]
2025-05-08 12:18:08,006 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 30 minutes, 32 seconds)
2025-05-08 12:20:26,900 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:20:29,778 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 641.61652 ± 510.858
2025-05-08 12:20:29,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [293.61792, 870.29694, 1208.2336, 1407.9196, 929.8824, 20.839918, 26.736414, 139.5817, 1207.6946, 311.36255]
2025-05-08 12:20:29,779 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [179.0, 338.0, 476.0, 494.0, 361.0, 32.0, 42.0, 131.0, 460.0, 184.0]
2025-05-08 12:20:29,786 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 28 minutes, 18 seconds)
2025-05-08 12:23:07,362 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:23:11,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 673.11346 ± 450.009
2025-05-08 12:23:11,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [301.48523, 1156.3838, 1260.309, 908.86566, 291.56247, 7.0302997, 13.330894, 1063.6678, 902.22687, 826.27277]
2025-05-08 12:23:11,057 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [179.0, 453.0, 455.0, 341.0, 170.0, 18.0, 26.0, 388.0, 368.0, 338.0]
2025-05-08 12:23:11,067 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 25 minutes, 38 seconds)
2025-05-08 12:26:09,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:26:13,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 905.67786 ± 444.046
2025-05-08 12:26:13,999 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1424.0747, 1014.12964, 1200.8491, 506.2748, 1482.7833, 604.5414, 569.53076, 24.76293, 1263.785, 966.047]
2025-05-08 12:26:14,000 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [491.0, 417.0, 420.0, 227.0, 518.0, 265.0, 245.0, 35.0, 427.0, 359.0]
2025-05-08 12:26:14,009 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 23 minutes, 17 seconds)
2025-05-08 12:28:42,994 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:28:47,193 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 990.58820 ± 509.120
2025-05-08 12:28:47,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1103.4781, 1290.2131, 48.305805, 1090.7972, 2098.1082, 1217.8763, 861.2894, 733.53503, 950.1052, 512.1727]
2025-05-08 12:28:47,194 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [369.0, 441.0, 68.0, 385.0, 684.0, 445.0, 361.0, 305.0, 447.0, 366.0]
2025-05-08 12:28:47,200 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 18 minutes, 23 seconds)
2025-05-08 12:31:02,797 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:31:06,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 748.19788 ± 452.115
2025-05-08 12:31:06,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1382.6345, 1233.5969, 27.589247, 60.69677, 284.33102, 1057.0085, 1042.1525, 873.9027, 737.7315, 782.33496]
2025-05-08 12:31:06,216 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [550.0, 504.0, 41.0, 86.0, 168.0, 407.0, 421.0, 319.0, 339.0, 330.0]
2025-05-08 12:31:06,224 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 15 minutes, 13 seconds)
2025-05-08 12:33:26,944 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:33:29,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 577.92621 ± 496.076
2025-05-08 12:33:29,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1106.7627, 19.135565, 321.65912, 40.019253, 1187.6351, 969.5429, 959.3274, 1093.5983, 56.034637, 25.547226]
2025-05-08 12:33:29,392 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [370.0, 36.0, 173.0, 60.0, 426.0, 395.0, 369.0, 382.0, 63.0, 50.0]
2025-05-08 12:33:29,401 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 12 minutes, 45 seconds)
2025-05-08 12:35:49,556 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:35:53,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 831.35822 ± 461.943
2025-05-08 12:35:53,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [942.0014, 1032.4479, 1253.3745, 1307.9204, 426.275, 1199.9895, 955.6189, 1128.42, 38.83884, 28.694834]
2025-05-08 12:35:53,095 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [474.0, 373.0, 428.0, 460.0, 220.0, 451.0, 384.0, 422.0, 53.0, 36.0]
2025-05-08 12:35:53,103 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 8 minutes, 34 seconds)
2025-05-08 12:38:07,292 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:38:11,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 909.98975 ± 539.478
2025-05-08 12:38:11,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1035.4371, 1161.6658, 1212.1534, 751.8179, 295.78882, 1182.4597, 1954.9546, 1146.5216, 335.2648, 23.834051]
2025-05-08 12:38:11,029 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [392.0, 431.0, 429.0, 295.0, 175.0, 387.0, 701.0, 416.0, 198.0, 35.0]
2025-05-08 12:38:11,036 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 2 minutes, 8 seconds)
2025-05-08 12:40:29,358 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:40:33,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 919.24377 ± 521.847
2025-05-08 12:40:33,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1507.1161, 71.44745, 1196.2711, 973.11945, 1552.92, 1256.0223, 288.10077, 1003.71826, 153.97668, 1189.7449]
2025-05-08 12:40:33,467 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [583.0, 129.0, 448.0, 394.0, 575.0, 424.0, 142.0, 416.0, 220.0, 444.0]
2025-05-08 12:40:33,476 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 76/100 (estimated time remaining: 58 minutes, 51 seconds)
2025-05-08 12:42:50,279 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:42:52,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 461.91635 ± 500.957
2025-05-08 12:42:52,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [353.64493, 20.091217, 1472.769, 1042.6948, 957.7268, 92.15843, 34.920074, 576.25543, 42.079838, 26.822628]
2025-05-08 12:42:52,432 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [207.0, 40.0, 489.0, 451.0, 368.0, 55.0, 47.0, 295.0, 53.0, 42.0]
2025-05-08 12:42:52,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 77/100 (estimated time remaining: 56 minutes, 29 seconds)
2025-05-08 12:45:09,981 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:45:13,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 814.22253 ± 510.370
2025-05-08 12:45:13,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1026.9296, 1303.3855, 1027.3818, 30.953953, 1244.4763, 1491.8807, 1076.9136, 40.83617, 244.63985, 654.82825]
2025-05-08 12:45:13,473 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [398.0, 498.0, 395.0, 47.0, 426.0, 540.0, 418.0, 48.0, 149.0, 327.0]
2025-05-08 12:45:13,481 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 78/100 (estimated time remaining: 53 minutes, 58 seconds)
2025-05-08 12:47:30,381 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:47:34,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 879.01843 ± 457.968
2025-05-08 12:47:34,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [867.6559, 1107.3615, 1140.205, 619.6812, 1347.6936, 1188.497, 30.00457, 193.60887, 1507.6927, 787.7846]
2025-05-08 12:47:34,144 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [367.0, 469.0, 416.0, 235.0, 462.0, 415.0, 50.0, 263.0, 517.0, 322.0]
2025-05-08 12:47:34,151 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 79/100 (estimated time remaining: 51 minutes, 24 seconds)
2025-05-08 12:49:52,341 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:49:55,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 833.32727 ± 453.152
2025-05-08 12:49:55,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [17.10343, 612.30536, 528.5909, 1250.5344, 371.1284, 1199.394, 933.88513, 1272.4073, 1524.029, 623.89484]
2025-05-08 12:49:55,740 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [26.0, 256.0, 224.0, 436.0, 192.0, 431.0, 396.0, 417.0, 528.0, 284.0]
2025-05-08 12:49:55,749 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 80/100 (estimated time remaining: 49 minutes, 19 seconds)
2025-05-08 12:52:19,713 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:52:24,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 850.36719 ± 584.218
2025-05-08 12:52:24,439 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1764.8604, 1617.3947, 566.5694, 863.4191, 18.887033, 848.6916, 363.42715, 23.094948, 1327.3511, 1109.9767]
2025-05-08 12:52:24,440 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [616.0, 572.0, 260.0, 341.0, 31.0, 308.0, 166.0, 37.0, 677.0, 416.0]
2025-05-08 12:52:24,450 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 81/100 (estimated time remaining: 47 minutes, 23 seconds)
2025-05-08 12:55:13,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:55:18,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 916.71466 ± 336.619
2025-05-08 12:55:18,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [569.8957, 1169.006, 1215.8411, 380.01077, 1180.8767, 1030.0952, 990.91003, 319.7061, 1236.5033, 1074.3019]
2025-05-08 12:55:18,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [242.0, 427.0, 436.0, 174.0, 425.0, 394.0, 410.0, 193.0, 458.0, 411.0]
2025-05-08 12:55:18,592 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 82/100 (estimated time remaining: 47 minutes, 15 seconds)
2025-05-08 12:58:17,150 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 12:58:22,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 925.54071 ± 533.665
2025-05-08 12:58:22,055 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1367.1278, 25.182589, 142.9119, 901.02344, 1596.2714, 314.0906, 1234.0612, 1229.1547, 1366.6649, 1078.9193]
2025-05-08 12:58:22,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [550.0, 35.0, 96.0, 343.0, 624.0, 197.0, 392.0, 440.0, 522.0, 406.0]
2025-05-08 12:58:22,066 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 83/100 (estimated time remaining: 47 minutes, 18 seconds)
2025-05-08 13:00:45,502 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:00:48,907 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 862.03894 ± 511.093
2025-05-08 13:00:48,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1334.5961, 1568.2157, 837.90894, 708.9116, 67.00748, 592.9496, 1462.9882, 853.47687, 22.658096, 1171.6766]
2025-05-08 13:00:48,908 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [446.0, 545.0, 314.0, 331.0, 80.0, 234.0, 484.0, 371.0, 54.0, 414.0]
2025-05-08 13:00:48,916 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 84/100 (estimated time remaining: 45 minutes, 2 seconds)
2025-05-08 13:03:08,315 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:03:13,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1323.29785 ± 494.208
2025-05-08 13:03:13,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1686.9222, 1919.261, 1379.8643, 1309.933, 1176.5468, 1300.5114, 30.291157, 1478.6724, 1169.5037, 1781.4723]
2025-05-08 13:03:13,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [598.0, 629.0, 505.0, 489.0, 420.0, 446.0, 44.0, 553.0, 491.0, 662.0]
2025-05-08 13:03:13,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (1323.30) for latency ExtremeSparseL4U32
2025-05-08 13:03:13,306 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 13:03:13,310 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 13:03:13,321 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 85/100 (estimated time remaining: 42 minutes, 32 seconds)
2025-05-08 13:05:25,302 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:05:28,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 812.02197 ± 526.511
2025-05-08 13:05:28,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1445.8445, 171.77989, 951.9105, 1173.7526, 1199.3579, 602.76355, 31.378555, 32.195244, 1306.4766, 1204.7604]
2025-05-08 13:05:28,495 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [501.0, 211.0, 370.0, 411.0, 456.0, 266.0, 53.0, 56.0, 468.0, 441.0]
2025-05-08 13:05:28,504 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes, 12 seconds)
2025-05-08 13:07:45,942 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:07:49,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 876.03204 ± 549.949
2025-05-08 13:07:49,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1026.6017, 1314.6207, 620.0351, 83.87808, 1278.077, 33.70816, 257.10977, 1293.0166, 1224.2141, 1629.059]
2025-05-08 13:07:49,385 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [427.0, 437.0, 270.0, 130.0, 486.0, 56.0, 133.0, 469.0, 434.0, 562.0]
2025-05-08 13:07:49,393 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 87/100 (estimated time remaining: 35 minutes, 2 seconds)
2025-05-08 13:10:06,979 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:10:11,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1009.25989 ± 451.140
2025-05-08 13:10:11,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1531.7476, 1215.8098, 1112.5518, 1151.982, 916.0395, 663.3456, 214.81065, 1460.2753, 1507.6356, 318.40097]
2025-05-08 13:10:11,056 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [521.0, 415.0, 437.0, 452.0, 363.0, 299.0, 114.0, 544.0, 530.0, 195.0]
2025-05-08 13:10:11,065 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 88/100 (estimated time remaining: 30 minutes, 43 seconds)
2025-05-08 13:12:23,513 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:12:27,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1066.92004 ± 502.984
2025-05-08 13:12:27,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [349.6491, 1376.6733, 1862.9567, 1244.1477, 1195.0996, 276.23486, 1224.01, 1314.0812, 423.39673, 1402.9515]
2025-05-08 13:12:27,753 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [210.0, 476.0, 669.0, 458.0, 445.0, 161.0, 455.0, 442.0, 206.0, 510.0]
2025-05-08 13:12:27,762 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 57 seconds)
2025-05-08 13:14:45,246 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:14:49,089 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 915.42908 ± 586.258
2025-05-08 13:14:49,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [676.7328, 1142.0492, 1754.6129, 37.865612, 812.9884, 516.0448, 677.8797, 239.84584, 1427.0658, 1869.2059]
2025-05-08 13:14:49,090 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [244.0, 541.0, 595.0, 56.0, 289.0, 265.0, 319.0, 125.0, 494.0, 701.0]
2025-05-08 13:14:49,098 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 30 seconds)
2025-05-08 13:17:06,346 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:17:09,843 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 813.58093 ± 543.604
2025-05-08 13:17:09,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1422.1487, 93.45383, 1663.414, 837.74634, 180.16203, 616.4625, 395.9674, 840.32324, 1601.0945, 485.03702]
2025-05-08 13:17:09,844 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [500.0, 135.0, 564.0, 335.0, 106.0, 299.0, 224.0, 336.0, 543.0, 314.0]
2025-05-08 13:17:09,853 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 22 seconds)
2025-05-08 13:19:27,670 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:19:31,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 948.08789 ± 525.936
2025-05-08 13:19:31,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1072.7013, 964.98596, 1316.6228, 287.43698, 174.33533, 1416.3182, 1467.0825, 1774.3685, 423.89478, 583.13257]
2025-05-08 13:19:31,460 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [394.0, 400.0, 443.0, 168.0, 124.0, 480.0, 512.0, 566.0, 259.0, 277.0]
2025-05-08 13:19:31,469 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 3 seconds)
2025-05-08 13:21:42,615 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:21:46,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1024.40234 ± 555.788
2025-05-08 13:21:46,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1529.6226, 1731.9637, 1420.4548, 1376.7139, 281.5886, 1193.0514, 354.99588, 31.662832, 1176.5568, 1147.4128]
2025-05-08 13:21:46,604 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [505.0, 593.0, 541.0, 459.0, 161.0, 418.0, 203.0, 51.0, 411.0, 451.0]
2025-05-08 13:21:46,613 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 32 seconds)
2025-05-08 13:24:07,487 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:24:11,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1050.18140 ± 612.206
2025-05-08 13:24:11,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1872.234, 1021.7749, 1370.0264, 857.5828, 1823.9922, 540.87976, 30.109972, 1441.8175, 169.70374, 1373.6918]
2025-05-08 13:24:11,685 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [618.0, 433.0, 480.0, 338.0, 590.0, 268.0, 78.0, 491.0, 194.0, 485.0]
2025-05-08 13:24:11,694 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 25 seconds)
2025-05-08 13:26:22,353 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:26:26,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1082.59167 ± 420.442
2025-05-08 13:26:26,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1204.3442, 1156.112, 1248.3588, 342.2075, 445.203, 639.141, 1592.6005, 1381.0657, 1347.7354, 1469.1493]
2025-05-08 13:26:26,444 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [444.0, 410.0, 427.0, 219.0, 216.0, 290.0, 494.0, 481.0, 435.0, 496.0]
2025-05-08 13:26:26,453 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 56 seconds)
2025-05-08 13:28:44,666 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:28:48,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 760.60883 ± 726.942
2025-05-08 13:28:48,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [291.6294, 181.7385, 2192.6326, 302.8933, 184.31828, 1228.0647, 218.09775, 1639.2081, 1332.0283, 35.477013]
2025-05-08 13:28:48,051 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [170.0, 144.0, 773.0, 187.0, 120.0, 483.0, 234.0, 579.0, 450.0, 60.0]
2025-05-08 13:28:48,061 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 38 seconds)
2025-05-08 13:31:03,223 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:31:08,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1398.43652 ± 396.483
2025-05-08 13:31:08,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1656.2532, 1758.551, 1995.9993, 1779.8951, 1340.5015, 1043.1553, 665.4185, 960.4631, 1477.0505, 1307.0773]
2025-05-08 13:31:08,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [562.0, 564.0, 703.0, 565.0, 454.0, 406.0, 355.0, 370.0, 525.0, 480.0]
2025-05-08 13:31:08,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1124 [INFO]: New best (1398.44) for latency ExtremeSparseL4U32
2025-05-08 13:31:08,576 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1127 [INFO]: saving network
2025-05-08 13:31:08,579 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of SAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-sac-aug-mem4/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-08 13:31:08,593 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 17 seconds)
2025-05-08 13:33:25,260 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:33:29,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1095.16821 ± 490.875
2025-05-08 13:33:29,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [639.4832, 223.81444, 1686.078, 917.0846, 1441.1626, 1428.0994, 1584.7253, 1144.4792, 1464.4603, 422.29492]
2025-05-08 13:33:29,569 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [308.0, 139.0, 576.0, 326.0, 484.0, 492.0, 543.0, 422.0, 517.0, 271.0]
2025-05-08 13:33:29,579 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 1 second)
2025-05-08 13:35:45,378 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:35:48,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 816.49249 ± 607.542
2025-05-08 13:35:48,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [873.87146, 1362.1675, 1593.322, 1401.6737, 628.7392, 19.412895, 36.885513, 679.2065, 28.220648, 1541.425]
2025-05-08 13:35:48,765 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [360.0, 478.0, 521.0, 508.0, 271.0, 31.0, 50.0, 274.0, 46.0, 666.0]
2025-05-08 13:35:48,775 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 38 seconds)
2025-05-08 13:38:06,575 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:38:10,408 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 998.93665 ± 642.683
2025-05-08 13:38:10,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [1427.3417, 1669.6917, 1387.0824, 292.05188, 1871.8547, 806.4474, 15.5727625, 25.46677, 1176.4319, 1317.4248]
2025-05-08 13:38:10,409 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [492.0, 558.0, 506.0, 171.0, 625.0, 377.0, 30.0, 36.0, 412.0, 421.0]
2025-05-08 13:38:10,418 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1097 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 20 seconds)
2025-05-08 13:40:23,581 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-08 13:40:28,379 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1223.16187 ± 322.859
2025-05-08 13:40:28,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1120 [DEBUG]: All rewards: [906.6013, 1094.765, 1719.3232, 1716.719, 1038.4711, 1204.2896, 1087.1334, 1650.5615, 916.6919, 897.06195]
2025-05-08 13:40:28,380 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [389.0, 447.0, 576.0, 571.0, 393.0, 437.0, 407.0, 572.0, 386.0, 344.0]
2025-05-08 13:40:28,390 latency_env.delayed_mdp:training_loop(baseline-sac-noisy-walker2d):1149 [DEBUG]: Training session finished
