2025-09-13 19:25:16,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 19:25:16,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 19:25:16,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15031e6e8e90>}
2025-09-13 19:25:16,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1111 [DEBUG]: using device: cuda
2025-09-13 19:25:16,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1133 [INFO]: Creating new trainer
2025-09-13 19:25:16,452 baseline-mbpac-noiseperc25-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-13 19:25:16,452 baseline-mbpac-noiseperc25-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 19:25:16,460 baseline-mbpac-noiseperc25-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 19:25:19,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1194 [DEBUG]: Starting training session...
2025-09-13 19:25:19,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 1/100
2025-09-13 19:36:16,475 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:36:16,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:36:56,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 42.03238 ± 82.333
2025-09-13 19:36:56,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.1203754, 2.742758, -11.3653, 277.6551, 78.32916, -9.708658, 30.461346, 15.406333, 13.625544, 25.297897]
2025-09-13 19:36:56,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 94.0, 124.0, 194.0, 176.0, 111.0, 220.0, 29.0, 104.0, 146.0]
2025-09-13 19:36:56,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (42.03) for latency ExtremeSparseL4U32
2025-09-13 19:36:56,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 10 minutes, 45 seconds)
2025-09-13 19:47:37,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:47:37,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:48:11,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 31.84396 ± 34.571
2025-09-13 19:48:11,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [30.455036, 4.5545106, 129.5043, 35.963097, 19.721094, 14.602782, 10.920851, 13.016523, 44.09966, 15.601753]
2025-09-13 19:48:11,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 65.0, 188.0, 109.0, 39.0, 125.0, 69.0, 212.0, 73.0, 165.0]
2025-09-13 19:48:11,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 41 minutes, 13 seconds)
2025-09-13 19:58:54,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:58:54,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:59:25,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 60.21265 ± 50.823
2025-09-13 19:59:25,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [44.211613, 49.07841, 156.29047, 147.76714, 69.027054, 71.69661, 28.324036, 2.0196567, 7.3525343, 26.358902]
2025-09-13 19:59:25,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 89.0, 91.0, 247.0, 187.0, 134.0, 56.0, 17.0, 21.0, 76.0]
2025-09-13 19:59:25,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (60.21) for latency ExtremeSparseL4U32
2025-09-13 19:59:25,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 22 minutes, 58 seconds)
2025-09-13 20:10:03,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:10:03,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:10:37,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 104.56862 ± 82.843
2025-09-13 20:10:37,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [131.88101, 195.27618, 286.83728, 74.977936, 30.422401, -1.2345886, 20.15203, 136.32198, 83.69164, 87.36033]
2025-09-13 20:10:37,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [99.0, 195.0, 176.0, 92.0, 115.0, 83.0, 37.0, 187.0, 80.0, 75.0]
2025-09-13 20:10:37,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (104.57) for latency ExtremeSparseL4U32
2025-09-13 20:10:37,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 7 minutes, 26 seconds)
2025-09-13 20:21:22,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:21:22,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:21:46,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 73.92154 ± 112.903
2025-09-13 20:21:46,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [57.53139, 361.12494, 33.924046, -0.39609963, 0.56042445, 212.59793, 21.982996, 4.9922814, 8.655504, 38.241955]
2025-09-13 20:21:46,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 220.0, 68.0, 18.0, 13.0, 147.0, 93.0, 16.0, 26.0, 100.0]
2025-09-13 20:21:46,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 52 minutes, 43 seconds)
2025-09-13 20:32:26,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:32:26,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:33:19,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 224.28044 ± 144.130
2025-09-13 20:33:19,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [200.70807, 432.0282, 37.36203, 221.25906, 342.1359, 246.11865, 32.708107, 353.50638, 17.108335, 359.86978]
2025-09-13 20:33:19,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 295.0, 89.0, 234.0, 235.0, 139.0, 94.0, 257.0, 36.0, 214.0]
2025-09-13 20:33:19,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (224.28) for latency ExtremeSparseL4U32
2025-09-13 20:33:19,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 39 minutes, 59 seconds)
2025-09-13 20:43:53,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:43:53,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:44:26,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 153.45065 ± 171.921
2025-09-13 20:44:26,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [283.27106, 73.21571, 366.96027, 17.556635, 227.80489, 36.22932, 506.68277, 23.001526, -1.244705, 1.0292226]
2025-09-13 20:44:26,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 66.0, 229.0, 46.0, 138.0, 61.0, 340.0, 32.0, 14.0, 13.0]
2025-09-13 20:44:26,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 26 minutes)
2025-09-13 20:55:12,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:55:12,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:55:49,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 132.13950 ± 120.074
2025-09-13 20:55:49,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [159.99796, 261.28314, 357.6868, 3.9286246, 12.852839, -1.3256509, 53.927486, 205.85158, 44.664143, 222.52803]
2025-09-13 20:55:49,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 169.0, 258.0, 13.0, 25.0, 13.0, 85.0, 193.0, 61.0, 229.0]
2025-09-13 20:55:49,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 17 minutes, 49 seconds)
2025-09-13 21:06:41,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:06:41,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:07:22,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 189.84471 ± 194.243
2025-09-13 21:07:22,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [356.62173, 88.66045, 1.5940216, 582.2256, 74.82918, 41.478676, 20.647593, 393.96832, 316.62625, 21.795347]
2025-09-13 21:07:22,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 80.0, 19.0, 400.0, 106.0, 44.0, 39.0, 244.0, 202.0, 46.0]
2025-09-13 21:07:22,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 12 minutes, 42 seconds)
2025-09-13 21:18:01,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:18:01,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:18:45,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 221.55635 ± 160.980
2025-09-13 21:18:45,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [200.5359, 3.592633, 428.4707, 9.804374, 189.95996, 190.76196, 410.66312, 375.27274, 27.840431, 378.66162]
2025-09-13 21:18:45,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 15.0, 302.0, 18.0, 153.0, 220.0, 246.0, 169.0, 42.0, 168.0]
2025-09-13 21:18:45,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 5 minutes, 46 seconds)
2025-09-13 21:29:23,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:29:23,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:29:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 44.45576 ± 34.084
2025-09-13 21:29:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [54.345882, 17.646383, 77.43991, 30.862469, 0.9785288, 19.193832, 33.211422, 13.583348, 103.45271, 93.84308]
2025-09-13 21:29:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [52.0, 24.0, 72.0, 56.0, 11.0, 40.0, 46.0, 54.0, 134.0, 156.0]
2025-09-13 21:29:42,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 43 minutes, 47 seconds)
2025-09-13 21:40:25,628 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:40:25,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:41:04,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 144.06099 ± 184.936
2025-09-13 21:41:04,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [50.114388, 396.6777, 1.1744794, 306.07526, 2.7277672, 12.575853, 42.101376, 538.007, 30.481953, 60.67405]
2025-09-13 21:41:04,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [91.0, 278.0, 13.0, 179.0, 27.0, 48.0, 88.0, 376.0, 47.0, 169.0]
2025-09-13 21:41:04,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 36 minutes, 53 seconds)
2025-09-13 21:51:36,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:51:36,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:52:11,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 175.23219 ± 159.938
2025-09-13 21:52:11,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.9603302, 73.96277, 4.928347, 266.93948, 12.057559, 58.24605, 271.87357, 267.32455, 292.25632, 501.77295]
2025-09-13 21:52:11,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 96.0, 16.0, 153.0, 22.0, 102.0, 151.0, 174.0, 178.0, 286.0]
2025-09-13 21:52:11,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 20 minutes, 45 seconds)
2025-09-13 22:02:57,279 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:02:57,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:03:26,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 144.66429 ± 139.370
2025-09-13 22:03:26,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [224.8965, 28.429197, 50.27608, 417.44073, 22.32294, 151.68436, 315.9727, -1.0005407, 8.755387, 227.86555]
2025-09-13 22:03:26,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 38.0, 55.0, 241.0, 29.0, 102.0, 192.0, 11.0, 18.0, 123.0]
2025-09-13 22:03:26,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 4 minutes, 26 seconds)
2025-09-13 22:14:15,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:14:15,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:14:57,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 192.54269 ± 172.014
2025-09-13 22:14:57,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [141.49428, 228.89326, 44.420933, 224.32195, 241.10371, 109.984886, 44.261417, 19.609438, 223.86305, 647.4742]
2025-09-13 22:14:57,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 137.0, 48.0, 128.0, 157.0, 153.0, 61.0, 27.0, 139.0, 382.0]
2025-09-13 22:14:57,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 55 minutes, 11 seconds)
2025-09-13 22:25:39,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:25:39,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:26:15,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 174.95969 ± 156.243
2025-09-13 22:26:15,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [90.24954, 455.7065, 7.046034, 44.807877, 265.42007, -1.8208233, 5.566215, 322.143, 302.94827, 257.53012]
2025-09-13 22:26:15,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 381.0, 20.0, 65.0, 127.0, 24.0, 16.0, 144.0, 144.0, 160.0]
2025-09-13 22:26:15,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 49 minutes, 51 seconds)
2025-09-13 22:36:56,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:36:56,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:37:56,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 300.43228 ± 168.838
2025-09-13 22:37:56,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [463.73795, 359.1965, 422.0315, 523.05316, 467.47607, 194.65541, 331.5425, 37.637363, 63.022778, 141.9698]
2025-09-13 22:37:56,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 241.0, 270.0, 326.0, 306.0, 141.0, 180.0, 66.0, 97.0, 171.0]
2025-09-13 22:37:56,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (300.43) for latency ExtremeSparseL4U32
2025-09-13 22:37:56,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 44 minutes, 1 second)
2025-09-13 22:48:38,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:48:38,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:49:39,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 271.59442 ± 184.495
2025-09-13 22:49:39,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [257.0484, 147.40466, 130.30528, 531.25525, 461.84915, 238.03142, 0.5580655, 548.6864, 59.58052, 341.22488]
2025-09-13 22:49:39,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 98.0, 158.0, 289.0, 340.0, 118.0, 11.0, 489.0, 69.0, 227.0]
2025-09-13 22:49:39,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 42 minutes, 21 seconds)
2025-09-13 23:00:15,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:00:15,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:01:04,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 251.09367 ± 134.741
2025-09-13 23:01:04,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [196.38716, 522.6536, 248.68906, 187.4943, 243.64238, 249.30565, 57.453335, 98.18802, 450.5906, 256.53275]
2025-09-13 23:01:04,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 305.0, 170.0, 123.0, 127.0, 188.0, 92.0, 116.0, 260.0, 141.0]
2025-09-13 23:01:04,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 33 minutes, 38 seconds)
2025-09-13 23:11:47,724 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:11:47,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:12:30,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 245.83865 ± 155.020
2025-09-13 23:12:30,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [509.1396, 35.049736, 1.5009773, 365.8986, 299.6743, 271.3984, 264.7709, 284.07474, 363.38464, 63.49462]
2025-09-13 23:12:30,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 61.0, 11.0, 213.0, 141.0, 160.0, 158.0, 194.0, 171.0, 87.0]
2025-09-13 23:12:30,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 21 minutes)
2025-09-13 23:23:19,020 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:23:19,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:24:10,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 291.82614 ± 84.415
2025-09-13 23:24:10,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [212.67174, 242.71638, 251.42412, 335.31546, 289.98172, 197.15952, 365.1906, 255.8768, 268.9829, 498.9421]
2025-09-13 23:24:10,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 132.0, 116.0, 197.0, 153.0, 251.0, 177.0, 139.0, 163.0, 216.0]
2025-09-13 23:24:10,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 15 minutes, 13 seconds)
2025-09-13 23:34:56,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:34:56,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:35:39,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 178.65106 ± 89.214
2025-09-13 23:35:39,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [127.64366, 61.22249, 66.60573, 258.1871, 248.19995, 298.38235, 244.48721, 212.03624, 223.9404, 45.805573]
2025-09-13 23:35:39,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [102.0, 70.0, 118.0, 172.0, 156.0, 269.0, 169.0, 151.0, 140.0, 56.0]
2025-09-13 23:35:39,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 11 seconds)
2025-09-13 23:46:23,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:46:23,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:46:59,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 211.43021 ± 118.836
2025-09-13 23:46:59,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [47.512154, 264.45258, 218.68288, 280.44052, 235.67458, 102.198975, 10.047645, 431.4689, 253.68394, 270.13977]
2025-09-13 23:46:59,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [51.0, 145.0, 112.0, 154.0, 118.0, 95.0, 22.0, 186.0, 193.0, 148.0]
2025-09-13 23:46:59,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 42 minutes, 56 seconds)
2025-09-13 23:57:46,140 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:57:46,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:58:46,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 329.95178 ± 139.829
2025-09-13 23:58:46,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [440.9577, 468.23456, 545.8216, 199.55028, 318.97626, 176.09747, 62.28184, 389.0855, 329.3664, 369.14606]
2025-09-13 23:58:46,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 244.0, 357.0, 106.0, 153.0, 109.0, 95.0, 201.0, 221.0, 247.0]
2025-09-13 23:58:46,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (329.95) for latency ExtremeSparseL4U32
2025-09-13 23:58:46,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 14 hours, 37 minutes, 5 seconds)
2025-09-14 00:09:22,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:09:22,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:10:18,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 283.36652 ± 141.203
2025-09-14 00:10:18,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [79.67218, 1.2705274, 303.60855, 310.3331, 342.81027, 253.10716, 355.50082, 449.30145, 256.53012, 481.53098]
2025-09-14 00:10:18,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [73.0, 12.0, 212.0, 177.0, 223.0, 150.0, 201.0, 292.0, 183.0, 361.0]
2025-09-14 00:10:19,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 27 minutes, 2 seconds)
2025-09-14 00:20:58,156 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:20:58,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:21:38,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 220.58267 ± 198.850
2025-09-14 00:21:38,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-5.0065384, 11.8, 529.5585, 443.44678, 25.963226, 191.9292, 230.22653, 6.2574472, 295.10474, 476.547]
2025-09-14 00:21:38,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 24.0, 290.0, 256.0, 54.0, 121.0, 128.0, 26.0, 147.0, 247.0]
2025-09-14 00:21:38,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 10 minutes, 21 seconds)
2025-09-14 00:32:38,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:32:38,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:33:26,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 288.64395 ± 114.760
2025-09-14 00:33:26,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [419.46817, 410.77652, 264.82373, 232.82088, 397.15045, 295.95978, 194.2196, 19.488811, 322.2585, 329.47287]
2025-09-14 00:33:26,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 224.0, 140.0, 139.0, 231.0, 176.0, 117.0, 33.0, 151.0, 158.0]
2025-09-14 00:33:26,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 3 minutes, 37 seconds)
2025-09-14 00:43:53,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:43:53,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:44:44,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 283.50040 ± 161.717
2025-09-14 00:44:44,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [112.841896, 40.204365, 347.9827, 395.4705, 601.62646, 377.84457, 255.32172, 96.5414, 233.39816, 373.77203]
2025-09-14 00:44:44,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [97.0, 109.0, 196.0, 235.0, 275.0, 214.0, 136.0, 114.0, 114.0, 217.0]
2025-09-14 00:44:44,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 51 minutes, 35 seconds)
2025-09-14 00:55:32,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:55:32,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:56:10,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 197.21831 ± 175.438
2025-09-14 00:56:10,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [434.12448, 18.305056, 287.20294, 421.13934, 45.500122, 43.82436, 45.610725, 182.43993, 455.22668, 38.809437]
2025-09-14 00:56:10,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 24.0, 131.0, 270.0, 63.0, 77.0, 45.0, 225.0, 176.0, 54.0]
2025-09-14 00:56:10,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 35 minutes, 6 seconds)
2025-09-14 01:06:54,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:06:54,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:07:47,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 331.79938 ± 142.642
2025-09-14 01:07:47,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [253.8493, 312.00388, 437.69553, 316.07843, 360.6286, 393.78577, 213.58603, 11.445158, 533.3036, 485.6174]
2025-09-14 01:07:47,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 159.0, 226.0, 189.0, 176.0, 195.0, 112.0, 19.0, 264.0, 231.0]
2025-09-14 01:07:47,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (331.80) for latency ExtremeSparseL4U32
2025-09-14 01:07:47,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 24 minutes, 40 seconds)
2025-09-14 01:18:31,059 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:18:31,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:19:32,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 400.80789 ± 118.160
2025-09-14 01:19:32,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [236.37726, 468.52032, 399.37216, 440.23895, 256.23846, 575.27637, 546.3636, 350.87302, 246.60428, 488.2143]
2025-09-14 01:19:32,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 214.0, 192.0, 217.0, 124.0, 312.0, 236.0, 204.0, 180.0, 239.0]
2025-09-14 01:19:32,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (400.81) for latency ExtremeSparseL4U32
2025-09-14 01:19:32,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 19 minutes, 5 seconds)
2025-09-14 01:30:10,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:30:10,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:31:26,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 558.88000 ± 156.668
2025-09-14 01:31:26,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [657.8445, 334.69696, 688.9418, 482.91623, 518.2383, 625.76166, 555.83606, 755.007, 253.33966, 716.21765]
2025-09-14 01:31:26,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 163.0, 280.0, 240.0, 221.0, 231.0, 293.0, 359.0, 189.0, 313.0]
2025-09-14 01:31:26,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (558.88) for latency ExtremeSparseL4U32
2025-09-14 01:31:26,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 8 minutes, 48 seconds)
2025-09-14 01:42:20,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:42:20,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:43:16,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 355.15887 ± 172.172
2025-09-14 01:43:16,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [481.13626, 362.40482, 547.03357, 418.59848, 233.68213, 393.24915, 217.39377, 628.0334, 263.69217, 6.3652563]
2025-09-14 01:43:16,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 191.0, 239.0, 226.0, 121.0, 194.0, 114.0, 383.0, 165.0, 19.0]
2025-09-14 01:43:16,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 4 minutes, 24 seconds)
2025-09-14 01:53:51,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:53:51,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:54:46,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 344.04480 ± 148.657
2025-09-14 01:54:46,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [256.49738, 7.2315, 393.11093, 355.64206, 472.94608, 544.9014, 493.116, 355.85428, 210.00426, 351.14395]
2025-09-14 01:54:46,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 21.0, 200.0, 222.0, 243.0, 256.0, 336.0, 149.0, 109.0, 154.0]
2025-09-14 01:54:46,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 53 minutes, 33 seconds)
2025-09-14 02:05:34,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:05:34,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:06:53,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 571.84320 ± 129.075
2025-09-14 02:06:53,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [582.56134, 578.14575, 573.8026, 611.3888, 644.0479, 486.3536, 713.69403, 717.4839, 238.63911, 572.3151]
2025-09-14 02:06:53,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [244.0, 226.0, 271.0, 254.0, 268.0, 354.0, 287.0, 326.0, 150.0, 259.0]
2025-09-14 02:06:53,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (571.84) for latency ExtremeSparseL4U32
2025-09-14 02:06:53,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 48 minutes, 11 seconds)
2025-09-14 02:17:45,982 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:17:45,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:18:25,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 261.45758 ± 290.192
2025-09-14 02:18:25,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [388.5805, 570.6881, 29.848383, 607.033, 799.39435, -0.5523124, 10.352693, 2.693482, 201.19055, 5.346865]
2025-09-14 02:18:25,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 241.0, 58.0, 266.0, 288.0, 22.0, 27.0, 15.0, 132.0, 24.0]
2025-09-14 02:18:25,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 33 minutes, 39 seconds)
2025-09-14 02:28:56,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:28:56,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:30:17,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 598.77045 ± 129.914
2025-09-14 02:30:17,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [702.25543, 607.34326, 478.2535, 638.00684, 537.79645, 646.25604, 902.92975, 505.193, 418.78482, 550.88556]
2025-09-14 02:30:17,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [344.0, 232.0, 198.0, 341.0, 218.0, 268.0, 463.0, 233.0, 187.0, 250.0]
2025-09-14 02:30:17,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (598.77) for latency ExtremeSparseL4U32
2025-09-14 02:30:17,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 21 minutes, 40 seconds)
2025-09-14 02:41:11,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:41:11,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:42:19,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 517.30371 ± 233.105
2025-09-14 02:42:19,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [817.59607, 613.21545, 421.71902, -0.87172794, 554.0753, 220.4039, 749.1726, 605.13745, 558.1944, 634.3944]
2025-09-14 02:42:19,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [326.0, 237.0, 193.0, 12.0, 220.0, 132.0, 330.0, 273.0, 241.0, 271.0]
2025-09-14 02:42:19,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 12 minutes, 7 seconds)
2025-09-14 02:53:03,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:53:03,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:54:01,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 434.71240 ± 251.306
2025-09-14 02:54:01,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.047139, 428.7421, 639.04486, 640.22473, 7.460742, 668.3425, 214.43883, 493.5349, 629.85504, 621.43286]
2025-09-14 02:54:01,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 233.0, 248.0, 273.0, 20.0, 277.0, 124.0, 215.0, 251.0, 302.0]
2025-09-14 02:54:01,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 2 minutes, 47 seconds)
2025-09-14 03:04:40,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:04:40,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:05:50,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 482.37436 ± 257.321
2025-09-14 03:05:50,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [146.5399, 567.4069, 417.70087, 213.87914, 719.0909, 767.78644, 498.25012, 64.86394, 866.40344, 561.8216]
2025-09-14 03:05:50,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 233.0, 189.0, 122.0, 297.0, 408.0, 220.0, 93.0, 375.0, 231.0]
2025-09-14 03:05:50,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 47 minutes, 29 seconds)
2025-09-14 03:16:35,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:16:35,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:16:54,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 109.58827 ± 209.448
2025-09-14 03:16:54,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.5698147, 5.9462214, 17.77289, -0.61160904, 17.161118, -0.7724427, 2.5270076, 472.8291, 578.38293, 5.2172885]
2025-09-14 03:16:54,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 19.0, 29.0, 20.0, 26.0, 19.0, 27.0, 205.0, 237.0, 15.0]
2025-09-14 03:16:54,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 30 minutes, 4 seconds)
2025-09-14 03:27:41,553 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:27:41,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:29:04,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 587.47913 ± 130.950
2025-09-14 03:29:04,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [861.4294, 500.42572, 532.3982, 635.8356, 486.44595, 660.5841, 344.6481, 570.75385, 679.3644, 602.9059]
2025-09-14 03:29:04,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [363.0, 211.0, 241.0, 264.0, 249.0, 269.0, 150.0, 427.0, 312.0, 258.0]
2025-09-14 03:29:04,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 21 minutes, 54 seconds)
2025-09-14 03:39:44,136 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:39:44,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:40:56,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 585.68561 ± 126.712
2025-09-14 03:40:56,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [737.8111, 719.4971, 408.4149, 597.291, 540.32513, 482.75815, 584.09283, 398.95102, 601.5896, 786.1257]
2025-09-14 03:40:56,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [281.0, 303.0, 235.0, 242.0, 245.0, 191.0, 258.0, 167.0, 237.0, 292.0]
2025-09-14 03:40:56,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 8 minutes, 20 seconds)
2025-09-14 03:51:57,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:51:57,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:52:47,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 389.40872 ± 216.950
2025-09-14 03:52:47,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [565.4425, 49.232796, 401.832, 629.51294, 323.8721, 475.66354, 8.672453, 260.46564, 669.0998, 510.2934]
2025-09-14 03:52:47,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [226.0, 69.0, 181.0, 229.0, 145.0, 219.0, 32.0, 132.0, 273.0, 193.0]
2025-09-14 03:52:47,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 58 minutes, 14 seconds)
2025-09-14 04:03:24,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:03:24,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:04:21,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 370.11935 ± 263.911
2025-09-14 04:04:21,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [11.304135, -1.5997568, 275.59637, 116.51541, 570.56165, 767.9729, 346.93744, 596.04175, 324.9019, 692.962]
2025-09-14 04:04:21,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 17.0, 142.0, 162.0, 257.0, 312.0, 168.0, 321.0, 183.0, 288.0]
2025-09-14 04:04:21,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 43 minutes, 39 seconds)
2025-09-14 04:15:03,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:15:03,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:15:21,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 101.29395 ± 155.178
2025-09-14 04:15:21,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.17917007, 5.5257955, 375.39795, -0.413891, -4.519973, 3.594108, 275.8049, 3.8113, -1.5166397, 355.4351]
2025-09-14 04:15:21,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 20.0, 188.0, 11.0, 11.0, 13.0, 166.0, 19.0, 22.0, 161.0]
2025-09-14 04:15:21,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 31 minutes, 21 seconds)
2025-09-14 04:26:10,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:26:10,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:27:38,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 674.54932 ± 149.684
2025-09-14 04:27:38,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [544.46954, 557.0266, 627.62604, 754.6376, 651.22815, 861.5806, 1006.23236, 518.09937, 553.3949, 671.198]
2025-09-14 04:27:38,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 244.0, 257.0, 319.0, 328.0, 441.0, 380.0, 213.0, 248.0, 259.0]
2025-09-14 04:27:38,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (674.55) for latency ExtremeSparseL4U32
2025-09-14 04:27:38,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 20 minutes, 45 seconds)
2025-09-14 04:38:26,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:38:26,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:39:43,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 648.50031 ± 187.420
2025-09-14 04:39:43,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [512.6576, 920.50446, 595.93494, 486.81537, 751.9428, 928.4578, 288.05508, 599.0321, 662.0652, 739.5372]
2025-09-14 04:39:43,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [214.0, 331.0, 235.0, 212.0, 322.0, 357.0, 121.0, 261.0, 259.0, 289.0]
2025-09-14 04:39:43,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 11 minutes, 14 seconds)
2025-09-14 04:50:26,912 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:50:26,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:52:03,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 783.65930 ± 136.041
2025-09-14 04:52:03,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [753.924, 707.16724, 650.95044, 825.4426, 630.8507, 751.68317, 743.7713, 729.71185, 1113.5328, 929.55865]
2025-09-14 04:52:03,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [292.0, 334.0, 256.0, 360.0, 257.0, 290.0, 286.0, 302.0, 505.0, 336.0]
2025-09-14 04:52:03,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (783.66) for latency ExtremeSparseL4U32
2025-09-14 04:52:03,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 4 minutes, 24 seconds)
2025-09-14 05:02:43,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:02:43,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:03:33,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 394.01529 ± 261.375
2025-09-14 05:03:33,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [593.51495, 23.190868, 676.26416, 716.93115, 571.2013, 23.318817, 432.90976, 507.23364, 373.53424, 22.054075]
2025-09-14 05:03:33,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 28.0, 246.0, 311.0, 235.0, 27.0, 173.0, 227.0, 151.0, 31.0]
2025-09-14 05:03:33,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 52 minutes, 1 second)
2025-09-14 05:14:33,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:14:33,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:15:42,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 529.85486 ± 284.841
2025-09-14 05:15:42,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [532.3224, 751.10974, 0.8726363, -4.3427334, 733.91565, 743.71027, 837.93024, 563.04266, 497.23453, 642.75275]
2025-09-14 05:15:42,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [237.0, 316.0, 20.0, 13.0, 271.0, 337.0, 323.0, 221.0, 205.0, 312.0]
2025-09-14 05:15:42,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 51 minutes, 18 seconds)
2025-09-14 05:26:37,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:26:37,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:27:38,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 503.36362 ± 288.140
2025-09-14 05:27:38,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [457.0504, 747.60767, 680.5397, 696.8041, 790.00226, 694.52386, 282.43442, 679.1603, 1.0953712, 4.4181747]
2025-09-14 05:27:38,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 297.0, 248.0, 259.0, 323.0, 282.0, 176.0, 243.0, 15.0, 21.0]
2025-09-14 05:27:38,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 36 minutes)
2025-09-14 05:38:04,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:38:04,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:39:34,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 804.88641 ± 181.427
2025-09-14 05:39:34,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [755.4025, 612.78937, 521.0358, 816.0277, 1046.8613, 837.8596, 756.502, 1099.4092, 975.2201, 627.75616]
2025-09-14 05:39:34,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [279.0, 246.0, 222.0, 305.0, 361.0, 307.0, 278.0, 398.0, 355.0, 242.0]
2025-09-14 05:39:34,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (804.89) for latency ExtremeSparseL4U32
2025-09-14 05:39:34,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 22 minutes, 40 seconds)
2025-09-14 05:50:23,010 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:50:23,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:51:27,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 489.37289 ± 255.099
2025-09-14 05:51:27,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [673.3529, 567.9599, 444.58292, 0.8994204, -0.9175054, 608.34, 650.8294, 609.063, 598.85455, 740.76483]
2025-09-14 05:51:27,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 253.0, 215.0, 27.0, 13.0, 293.0, 234.0, 220.0, 258.0, 301.0]
2025-09-14 05:51:27,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 6 minutes, 30 seconds)
2025-09-14 06:02:08,059 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:02:08,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:03:04,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 436.40884 ± 240.678
2025-09-14 06:03:04,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [456.99112, 602.2233, 580.71014, 701.7103, 399.9664, 390.33914, 699.20935, 1.6298302, 0.8222828, 530.4867]
2025-09-14 06:03:04,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 219.0, 256.0, 326.0, 171.0, 144.0, 269.0, 14.0, 26.0, 233.0]
2025-09-14 06:03:04,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 55 minutes, 37 seconds)
2025-09-14 06:13:47,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:13:47,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:14:52,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 513.86102 ± 297.614
2025-09-14 06:14:52,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [575.0142, 506.7121, 737.4535, 688.25714, 265.9366, 776.74976, -2.938589, 742.7111, 14.791268, 833.923]
2025-09-14 06:14:52,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 197.0, 298.0, 261.0, 154.0, 320.0, 18.0, 286.0, 25.0, 352.0]
2025-09-14 06:14:52,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 40 minutes, 46 seconds)
2025-09-14 06:25:42,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:25:42,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:26:51,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 564.55847 ± 323.331
2025-09-14 06:26:51,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [612.8203, 1.7366031, 706.0646, 633.30054, 12.614756, 353.4668, 783.0604, 1016.40344, 658.9686, 867.1485]
2025-09-14 06:26:51,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 21.0, 250.0, 271.0, 62.0, 151.0, 295.0, 422.0, 240.0, 327.0]
2025-09-14 06:26:51,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 29 minutes, 14 seconds)
2025-09-14 06:37:47,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:37:47,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:38:43,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 445.97003 ± 280.369
2025-09-14 06:38:43,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [611.8885, 563.4159, 513.2449, 565.3603, 660.99994, 333.33917, 6.5819917, 938.91394, 0.57775676, 265.37808]
2025-09-14 06:38:43,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 229.0, 202.0, 215.0, 282.0, 146.0, 31.0, 391.0, 15.0, 129.0]
2025-09-14 06:38:43,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 16 minutes, 52 seconds)
2025-09-14 06:49:12,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:49:12,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:50:19,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 565.58588 ± 248.406
2025-09-14 06:50:19,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [671.7556, 7.2175584, 597.3617, 272.368, 704.5244, 533.31165, 824.6147, 728.0704, 853.2261, 463.40912]
2025-09-14 06:50:19,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 18.0, 223.0, 142.0, 265.0, 214.0, 331.0, 278.0, 309.0, 193.0]
2025-09-14 06:50:19,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 2 minutes, 42 seconds)
2025-09-14 07:01:06,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:01:06,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:02:18,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 578.43945 ± 219.778
2025-09-14 07:02:18,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [336.82538, 736.56995, 665.5469, 236.5111, 628.51416, 727.9003, 692.96564, 238.842, 927.18427, 593.53503]
2025-09-14 07:02:18,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 316.0, 240.0, 135.0, 255.0, 282.0, 248.0, 141.0, 366.0, 231.0]
2025-09-14 07:02:18,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 53 minutes, 49 seconds)
2025-09-14 07:13:14,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:13:14,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:14:22,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 550.94690 ± 229.426
2025-09-14 07:14:22,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [436.37488, 559.4836, 733.897, 412.6593, 477.92966, 0.9575414, 782.315, 765.35236, 781.5668, 558.9333]
2025-09-14 07:14:22,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 217.0, 296.0, 181.0, 192.0, 28.0, 313.0, 301.0, 290.0, 250.0]
2025-09-14 07:14:22,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 44 minutes, 5 seconds)
2025-09-14 07:24:57,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:24:57,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:26:17,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 648.84607 ± 140.437
2025-09-14 07:26:17,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [408.5214, 594.1282, 673.7074, 761.5236, 887.91394, 430.60126, 594.7912, 680.8983, 697.86646, 758.5091]
2025-09-14 07:26:17,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 269.0, 251.0, 286.0, 351.0, 183.0, 254.0, 307.0, 264.0, 300.0]
2025-09-14 07:26:17,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 31 minutes, 37 seconds)
2025-09-14 07:37:03,350 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:37:03,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:38:27,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 662.34973 ± 231.435
2025-09-14 07:38:27,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [701.1689, 601.61523, 813.57465, 955.691, 675.47437, 841.559, 854.6323, 600.179, 477.9447, 101.658295]
2025-09-14 07:38:27,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 250.0, 335.0, 406.0, 277.0, 329.0, 339.0, 250.0, 244.0, 136.0]
2025-09-14 07:38:27,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 21 minutes, 59 seconds)
2025-09-14 07:49:24,325 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:49:24,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:50:41,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 645.10651 ± 220.703
2025-09-14 07:50:41,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [846.1632, 335.2474, 143.30344, 617.12244, 833.1478, 735.44006, 740.91565, 654.261, 852.46484, 692.9991]
2025-09-14 07:50:41,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [296.0, 156.0, 122.0, 241.0, 360.0, 263.0, 282.0, 249.0, 356.0, 282.0]
2025-09-14 07:50:41,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 14 minutes, 40 seconds)
2025-09-14 08:01:20,783 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:01:20,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:02:37,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 658.29944 ± 353.928
2025-09-14 08:02:37,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [883.7587, 0.6781074, 3.7743673, 746.15546, 767.8313, 868.9651, 897.233, 813.2139, 1079.2739, 522.11035]
2025-09-14 08:02:37,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [311.0, 18.0, 18.0, 294.0, 283.0, 374.0, 337.0, 326.0, 412.0, 220.0]
2025-09-14 08:02:37,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 2 minutes, 18 seconds)
2025-09-14 08:13:36,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:13:36,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:14:41,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 486.27310 ± 399.828
2025-09-14 08:14:41,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [770.44604, 660.4757, 1372.3315, 91.211075, 687.86066, 464.6958, 515.86096, 294.23605, 0.16789193, 5.4453464]
2025-09-14 08:14:41,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [312.0, 269.0, 555.0, 169.0, 256.0, 195.0, 204.0, 161.0, 22.0, 19.0]
2025-09-14 08:14:41,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 50 minutes, 5 seconds)
2025-09-14 08:25:33,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:25:33,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:26:35,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 506.03436 ± 291.101
2025-09-14 08:26:35,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.31505254, 630.4037, 601.9631, 834.1239, 6.1352363, 618.7318, 716.5467, 753.9838, 244.58751, 654.1832]
2025-09-14 08:26:35,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 283.0, 236.0, 335.0, 19.0, 253.0, 286.0, 300.0, 127.0, 239.0]
2025-09-14 08:26:35,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 38 minutes, 1 second)
2025-09-14 08:37:06,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:37:06,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:38:12,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 542.22302 ± 312.635
2025-09-14 08:38:12,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [695.04236, 1063.4093, 716.9463, 639.1032, 667.8248, 464.91443, 458.3834, 2.5116768, 0.7804247, 713.3143]
2025-09-14 08:38:12,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 419.0, 307.0, 288.0, 261.0, 179.0, 179.0, 13.0, 16.0, 266.0]
2025-09-14 08:38:12,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 22 minutes, 23 seconds)
2025-09-14 08:49:05,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:49:05,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:50:20,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 600.70892 ± 262.257
2025-09-14 08:50:20,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [630.0887, 432.50455, 446.1744, 664.2779, 846.83856, 1048.1594, 608.17303, 570.6191, 11.585028, 748.6691]
2025-09-14 08:50:20,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 188.0, 175.0, 241.0, 302.0, 404.0, 250.0, 219.0, 29.0, 420.0]
2025-09-14 08:50:20,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 9 minutes, 48 seconds)
2025-09-14 09:01:06,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:01:06,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:02:35,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 751.03601 ± 125.278
2025-09-14 09:02:35,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [897.3276, 796.11456, 707.0376, 633.5907, 631.0986, 849.29297, 680.8983, 907.55237, 876.7133, 530.734]
2025-09-14 09:02:35,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 322.0, 273.0, 253.0, 240.0, 302.0, 262.0, 350.0, 380.0, 211.0]
2025-09-14 09:02:35,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 59 minutes, 44 seconds)
2025-09-14 09:13:06,403 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:13:06,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:14:28,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 663.25995 ± 185.865
2025-09-14 09:14:28,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [793.0863, 347.70367, 773.6954, 831.2787, 739.4635, 775.52905, 536.5751, 668.8331, 849.0157, 317.419]
2025-09-14 09:14:28,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 211.0, 327.0, 315.0, 267.0, 274.0, 241.0, 317.0, 346.0, 164.0]
2025-09-14 09:14:28,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 46 minutes, 45 seconds)
2025-09-14 09:25:24,692 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:25:24,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:26:40,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 591.24951 ± 238.835
2025-09-14 09:26:40,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [680.4483, 565.6393, 703.6399, 866.9599, 586.2418, 346.41986, 697.9929, 825.14685, 636.18414, 3.8217225]
2025-09-14 09:26:40,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [246.0, 226.0, 278.0, 431.0, 251.0, 161.0, 271.0, 352.0, 247.0, 18.0]
2025-09-14 09:26:40,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 36 minutes, 28 seconds)
2025-09-14 09:37:19,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:37:19,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:38:41,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 718.00970 ± 114.361
2025-09-14 09:38:41,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [765.30695, 707.70105, 700.27155, 884.31366, 737.1566, 650.61926, 781.30237, 818.00244, 702.57666, 432.8465]
2025-09-14 09:38:41,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [298.0, 271.0, 258.0, 318.0, 276.0, 224.0, 309.0, 303.0, 266.0, 175.0]
2025-09-14 09:38:41,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 26 minutes, 33 seconds)
2025-09-14 09:49:43,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:49:43,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:51:01,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 663.09436 ± 289.817
2025-09-14 09:51:01,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [532.0474, 746.8671, 12.047551, 688.6472, 675.1758, 1163.574, 478.75192, 961.85, 776.3328, 595.64966]
2025-09-14 09:51:01,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 275.0, 29.0, 252.0, 281.0, 439.0, 206.0, 332.0, 264.0, 242.0]
2025-09-14 09:51:01,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 15 minutes, 33 seconds)
2025-09-14 10:01:47,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:01:47,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:03:17,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 761.65204 ± 349.499
2025-09-14 10:03:17,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [640.6228, 977.53534, 688.51245, 566.8971, 661.0467, 953.6134, 665.6222, 1396.8838, 7.2556014, 1058.5314]
2025-09-14 10:03:17,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 359.0, 259.0, 203.0, 278.0, 349.0, 286.0, 479.0, 33.0, 458.0]
2025-09-14 10:03:17,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 3 minutes, 29 seconds)
2025-09-14 10:13:48,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:13:48,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:14:51,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 544.44867 ± 371.130
2025-09-14 10:14:51,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [697.8292, 1062.4178, 667.2471, 665.3966, 823.74756, 867.3586, 637.7986, 9.965319, 13.902992, -1.1766949]
2025-09-14 10:14:51,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 373.0, 251.0, 257.0, 292.0, 353.0, 238.0, 29.0, 43.0, 10.0]
2025-09-14 10:14:51,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 49 minutes, 52 seconds)
2025-09-14 10:25:59,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:25:59,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:27:24,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 685.49030 ± 265.554
2025-09-14 10:27:24,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [719.14636, 14.166068, 738.26294, 779.6006, 966.94867, 420.233, 641.13074, 869.8258, 863.5658, 842.02246]
2025-09-14 10:27:24,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [336.0, 31.0, 295.0, 326.0, 381.0, 184.0, 275.0, 344.0, 334.0, 324.0]
2025-09-14 10:27:24,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 39 minutes, 23 seconds)
2025-09-14 10:37:49,921 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:37:49,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:39:19,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 804.96814 ± 223.475
2025-09-14 10:39:19,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1169.6798, 1068.4762, 557.47986, 660.22015, 976.03625, 384.1599, 854.3873, 773.42474, 753.0786, 852.7386]
2025-09-14 10:39:19,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [382.0, 354.0, 233.0, 229.0, 356.0, 170.0, 355.0, 314.0, 265.0, 314.0]
2025-09-14 10:39:19,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (804.97) for latency ExtremeSparseL4U32
2025-09-14 10:39:19,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 26 minutes, 50 seconds)
2025-09-14 10:50:07,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:50:07,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:51:27,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 665.59558 ± 202.587
2025-09-14 10:51:27,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [910.8218, 772.562, 731.511, 763.508, 209.79108, 746.95074, 891.6221, 559.77484, 618.3101, 451.10425]
2025-09-14 10:51:27,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [335.0, 294.0, 257.0, 294.0, 120.0, 294.0, 368.0, 237.0, 241.0, 198.0]
2025-09-14 10:51:27,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 13 minutes, 47 seconds)
2025-09-14 11:02:29,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:02:29,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:03:48,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 691.87109 ± 178.575
2025-09-14 11:03:48,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [762.4203, 710.77826, 722.88226, 715.97144, 819.4613, 724.03925, 810.5001, 554.1256, 881.6674, 216.86452]
2025-09-14 11:03:48,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 278.0, 287.0, 282.0, 337.0, 277.0, 302.0, 210.0, 302.0, 109.0]
2025-09-14 11:03:48,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 2 minutes, 5 seconds)
2025-09-14 11:14:17,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:14:17,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:15:26,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 612.83307 ± 304.573
2025-09-14 11:15:26,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [707.01086, 762.52704, 767.52673, 5.3570933, 65.89056, 599.5204, 1008.8455, 759.7904, 732.3382, 719.5234]
2025-09-14 11:15:26,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 268.0, 264.0, 23.0, 100.0, 237.0, 341.0, 279.0, 251.0, 273.0]
2025-09-14 11:15:26,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 50 minutes, 11 seconds)
2025-09-14 11:26:15,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:26:15,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:27:41,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 701.41077 ± 173.168
2025-09-14 11:27:41,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [631.1585, 765.0924, 890.53687, 758.66296, 627.1432, 672.9837, 835.84827, 881.5948, 259.97507, 691.1123]
2025-09-14 11:27:41,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 281.0, 399.0, 292.0, 275.0, 247.0, 311.0, 351.0, 160.0, 277.0]
2025-09-14 11:27:41,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 37 minutes)
2025-09-14 11:38:34,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:38:34,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:39:53,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 663.11780 ± 260.923
2025-09-14 11:39:53,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [757.5836, 532.9155, 582.7959, 864.89777, 657.6088, 223.90703, 725.5484, 429.82364, 1256.7795, 599.3178]
2025-09-14 11:39:53,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [316.0, 225.0, 221.0, 323.0, 297.0, 118.0, 261.0, 198.0, 415.0, 251.0]
2025-09-14 11:39:53,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 25 minutes, 53 seconds)
2025-09-14 11:50:42,569 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:50:42,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:51:54,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 663.36298 ± 393.259
2025-09-14 11:51:54,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [915.45154, 706.20056, 949.2583, 1126.1027, 852.38696, 938.7693, 871.604, 4.8387, -2.074875, 271.09253]
2025-09-14 11:51:54,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [343.0, 241.0, 323.0, 363.0, 310.0, 348.0, 304.0, 17.0, 12.0, 133.0]
2025-09-14 11:51:54,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 13 minutes, 26 seconds)
2025-09-14 12:02:41,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:02:41,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:03:45,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 550.97211 ± 394.112
2025-09-14 12:03:45,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [436.9402, 828.8058, 804.4552, 9.089331, 813.6231, 1084.8677, 922.5355, -1.5520386, -0.5059801, 611.4623]
2025-09-14 12:03:45,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 302.0, 368.0, 20.0, 300.0, 374.0, 316.0, 17.0, 10.0, 236.0]
2025-09-14 12:03:45,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 59 minutes, 50 seconds)
2025-09-14 12:14:43,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:14:43,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:16:18,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 842.13007 ± 254.113
2025-09-14 12:16:18,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [981.1349, 884.4155, 856.59735, 915.3161, 615.387, 317.99142, 1265.7314, 1111.4279, 833.37714, 639.9221]
2025-09-14 12:16:18,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [366.0, 329.0, 306.0, 338.0, 226.0, 159.0, 439.0, 386.0, 320.0, 237.0]
2025-09-14 12:16:18,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (842.13) for latency ExtremeSparseL4U32
2025-09-14 12:16:18,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 50 minutes, 27 seconds)
2025-09-14 12:27:08,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:27:08,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:28:36,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 769.39844 ± 264.474
2025-09-14 12:28:36,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [892.58856, 893.11206, 951.54364, 794.677, 757.6004, 10.632705, 785.3836, 764.4168, 843.1474, 1000.88184]
2025-09-14 12:28:36,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 314.0, 331.0, 329.0, 283.0, 33.0, 283.0, 272.0, 330.0, 375.0]
2025-09-14 12:28:36,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 38 minutes, 21 seconds)
2025-09-14 12:39:01,823 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:39:01,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:40:28,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 740.99109 ± 139.154
2025-09-14 12:40:28,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [645.10645, 522.54504, 541.8597, 859.3662, 690.898, 944.0815, 746.059, 784.7989, 934.2795, 740.9168]
2025-09-14 12:40:28,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [240.0, 234.0, 218.0, 301.0, 255.0, 380.0, 296.0, 317.0, 319.0, 288.0]
2025-09-14 12:40:28,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 25 minutes, 24 seconds)
2025-09-14 12:51:26,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:51:26,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:52:13,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 417.04239 ± 345.764
2025-09-14 12:52:13,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [606.2002, 0.98542905, -3.5090077, 3.1504097, 634.14557, 593.8353, 760.41614, 4.440082, 768.18134, 802.57825]
2025-09-14 12:52:13,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [221.0, 12.0, 11.0, 19.0, 227.0, 239.0, 260.0, 16.0, 287.0, 270.0]
2025-09-14 12:52:13,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 12 minutes, 42 seconds)
2025-09-14 13:02:56,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:02:56,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:03:40,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 400.98529 ± 389.137
2025-09-14 13:03:40,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [935.3131, 530.1292, 899.8752, 202.34608, 5.2872176, 2.627207, 1.311502, 510.73828, 924.50256, -2.2776184]
2025-09-14 13:03:40,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 195.0, 299.0, 113.0, 15.0, 14.0, 18.0, 191.0, 311.0, 9.0]
2025-09-14 13:03:40,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 59 minutes, 49 seconds)
2025-09-14 13:14:29,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:14:29,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:15:42,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 634.81598 ± 329.749
2025-09-14 13:15:42,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [844.83276, 838.1004, 729.4928, 613.857, 878.4468, 943.2984, -5.0047126, 2.3986578, 778.73566, 724.00195]
2025-09-14 13:15:42,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [328.0, 278.0, 283.0, 241.0, 327.0, 351.0, 15.0, 18.0, 322.0, 254.0]
2025-09-14 13:15:42,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 46 minutes, 55 seconds)
2025-09-14 13:26:22,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:26:22,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:27:50,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 789.04578 ± 206.284
2025-09-14 13:27:50,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [528.85504, 752.1526, 848.0891, 825.38776, 778.50854, 637.3089, 1323.2405, 586.9848, 835.2122, 774.71796]
2025-09-14 13:27:50,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [226.0, 291.0, 295.0, 277.0, 294.0, 252.0, 484.0, 220.0, 297.0, 305.0]
2025-09-14 13:27:50,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 34 minutes, 46 seconds)
2025-09-14 13:38:49,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:38:49,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:40:00,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 610.90265 ± 256.899
2025-09-14 13:40:00,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [998.3831, 616.19244, 621.9926, 797.8269, 5.028023, 769.1798, 419.88885, 452.95682, 714.3509, 713.2269]
2025-09-14 13:40:00,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [366.0, 260.0, 248.0, 296.0, 21.0, 280.0, 177.0, 180.0, 285.0, 243.0]
2025-09-14 13:40:00,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 23 minutes, 21 seconds)
2025-09-14 13:50:29,706 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:50:29,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:51:56,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 781.51385 ± 265.080
2025-09-14 13:51:56,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [776.91815, 889.73083, 704.1595, 1267.958, 929.3277, 164.08151, 756.2454, 894.63043, 608.21497, 823.87213]
2025-09-14 13:51:56,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [287.0, 341.0, 264.0, 414.0, 342.0, 79.0, 270.0, 341.0, 235.0, 292.0]
2025-09-14 13:51:56,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 11 minutes, 39 seconds)
2025-09-14 14:02:40,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:02:40,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:04:10,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 772.32446 ± 245.634
2025-09-14 14:04:10,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [880.2638, 1201.7944, 796.2769, 624.4018, 968.0458, 775.08246, 826.08215, 730.4709, 731.77875, 189.04794]
2025-09-14 14:04:10,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 475.0, 352.0, 265.0, 331.0, 297.0, 294.0, 299.0, 290.0, 85.0]
2025-09-14 14:04:10,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 29 seconds)
2025-09-14 14:14:52,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:14:52,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:16:00,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 619.14758 ± 326.440
2025-09-14 14:16:00,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [843.0769, 742.19446, 797.3501, 971.20795, 658.3369, 736.01056, 557.6342, -0.24247326, 878.80426, 7.103257]
2025-09-14 14:16:00,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 275.0, 294.0, 334.0, 293.0, 273.0, 213.0, 16.0, 290.0, 22.0]
2025-09-14 14:16:00,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 48 minutes, 14 seconds)
2025-09-14 14:26:50,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:26:50,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:28:12,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 732.55023 ± 275.121
2025-09-14 14:28:12,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [931.20435, 582.88416, 685.4323, 6.224417, 905.7553, 935.86487, 870.0638, 884.68317, 603.28033, 920.1097]
2025-09-14 14:28:12,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [330.0, 215.0, 260.0, 18.0, 306.0, 318.0, 342.0, 351.0, 219.0, 339.0]
2025-09-14 14:28:12,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 36 minutes, 13 seconds)
2025-09-14 14:39:04,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:39:04,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:40:37,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 850.47833 ± 172.817
2025-09-14 14:40:37,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [798.16254, 936.15643, 973.833, 1170.5098, 600.8381, 726.3719, 640.4177, 951.4441, 714.09576, 992.9545]
2025-09-14 14:40:37,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [296.0, 365.0, 362.0, 393.0, 226.0, 264.0, 241.0, 330.0, 276.0, 357.0]
2025-09-14 14:40:37,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (850.48) for latency ExtremeSparseL4U32
2025-09-14 14:40:37,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 14 seconds)
2025-09-14 14:51:14,801 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:51:14,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:52:26,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 674.33752 ± 266.505
2025-09-14 14:52:26,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [884.2333, 802.78815, 880.03876, 937.98663, 411.8701, 725.74445, 809.699, 670.9903, 612.11945, 7.904551]
2025-09-14 14:52:26,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [309.0, 272.0, 305.0, 328.0, 153.0, 284.0, 286.0, 233.0, 224.0, 23.0]
2025-09-14 14:52:26,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 6 seconds)
2025-09-14 15:03:04,269 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 15:03:04,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 15:04:29,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 771.77625 ± 152.030
2025-09-14 15:04:29,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [872.99756, 729.73065, 674.42114, 770.0136, 956.5559, 641.77234, 877.05963, 1040.1693, 576.552, 578.4897]
2025-09-14 15:04:29,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [298.0, 271.0, 256.0, 254.0, 326.0, 227.0, 326.0, 433.0, 233.0, 233.0]
2025-09-14 15:04:29,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1251 [DEBUG]: Training session finished
