2025-09-13 18:25:00,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:25:00,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:25:00,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14a1b2d08c50>}
2025-09-13 18:25:00,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1111 [DEBUG]: using device: cuda
2025-09-13 18:25:00,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1133 [INFO]: Creating new trainer
2025-09-13 18:25:00,559 baseline-mbpac-noiseperc20-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-13 18:25:00,559 baseline-mbpac-noiseperc20-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 18:25:00,566 baseline-mbpac-noiseperc20-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 18:25:01,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1194 [DEBUG]: Starting training session...
2025-09-13 18:25:01,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 1/100
2025-09-13 18:35:52,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:35:52,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:36:44,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 157.20145 ± 139.012
2025-09-13 18:36:44,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [231.29906, 227.58963, 220.31151, 43.61922, 12.670757, -3.9989257, 438.71515, 184.61813, -22.388569, 239.57866]
2025-09-13 18:36:44,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 148.0, 136.0, 186.0, 212.0, 108.0, 380.0, 122.0, 112.0, 148.0]
2025-09-13 18:36:44,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (157.20) for latency ExtremeSparseL4U32
2025-09-13 18:36:44,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 20 minutes, 21 seconds)
2025-09-13 18:47:32,039 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:47:32,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:48:24,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 84.08910 ± 92.012
2025-09-13 18:48:24,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [62.269318, 81.65602, 135.58066, 204.83008, -11.063549, 222.46742, -45.34708, 9.43965, 0.14288677, 180.9156]
2025-09-13 18:48:24,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 235.0, 251.0, 119.0, 104.0, 132.0, 113.0, 144.0, 97.0, 305.0]
2025-09-13 18:48:24,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 5 minutes, 44 seconds)
2025-09-13 18:59:12,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:59:12,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:59:47,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 144.12520 ± 122.237
2025-09-13 18:59:47,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [287.24667, 16.339764, 386.7605, 216.29318, 153.42963, 21.612795, 40.926407, 105.59506, 202.61952, 10.428299]
2025-09-13 18:59:47,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 34.0, 313.0, 133.0, 123.0, 46.0, 98.0, 106.0, 127.0, 22.0]
2025-09-13 18:59:47,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 44 minutes, 22 seconds)
2025-09-13 19:10:22,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:10:22,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:10:57,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 88.82097 ± 66.603
2025-09-13 19:10:57,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [27.412548, 103.24444, 21.014563, 33.486427, 94.00796, 54.7133, 211.22464, 36.25966, 101.276665, 205.5695]
2025-09-13 19:10:57,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [70.0, 160.0, 35.0, 139.0, 89.0, 69.0, 125.0, 39.0, 202.0, 225.0]
2025-09-13 19:10:57,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 22 minutes, 28 seconds)
2025-09-13 19:21:38,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:21:38,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:22:10,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 80.14967 ± 84.770
2025-09-13 19:22:10,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [314.9348, 29.174154, 31.301508, 58.05848, 52.07338, 20.46537, 28.937092, 131.97931, 42.058037, 92.51461]
2025-09-13 19:22:10,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [245.0, 100.0, 122.0, 96.0, 82.0, 97.0, 46.0, 110.0, 55.0, 133.0]
2025-09-13 19:22:10,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 6 minutes, 1 second)
2025-09-13 19:32:50,190 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:32:50,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:33:20,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 101.43125 ± 111.880
2025-09-13 19:33:20,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [362.08072, 106.42339, 29.260221, 47.371914, 165.58904, 1.0022902, 7.1124935, 22.046396, 229.41464, 44.011333]
2025-09-13 19:33:20,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [280.0, 108.0, 47.0, 98.0, 178.0, 15.0, 18.0, 43.0, 134.0, 84.0]
2025-09-13 19:33:20,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 44 minutes, 8 seconds)
2025-09-13 19:44:02,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:44:02,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:44:22,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 80.21833 ± 88.155
2025-09-13 19:44:22,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [157.68611, 123.36535, 4.2257795, 301.8957, 15.203969, 73.54988, 25.368305, 38.27473, 7.743755, 54.869694]
2025-09-13 19:44:22,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [83.0, 124.0, 17.0, 177.0, 24.0, 78.0, 34.0, 53.0, 15.0, 63.0]
2025-09-13 19:44:22,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 21 minutes, 9 seconds)
2025-09-13 19:55:08,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:55:08,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:55:16,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 9.90873 ± 10.879
2025-09-13 19:55:16,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [30.655624, 4.62233, 19.105112, -0.14352874, 27.56657, 2.7536476, 0.8034821, 3.5652168, 6.9231052, 3.235738]
2025-09-13 19:55:16,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 14.0, 40.0, 28.0, 48.0, 14.0, 32.0, 19.0, 17.0, 13.0]
2025-09-13 19:55:16,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 45 seconds)
2025-09-13 20:06:05,918 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:06:05,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:06:40,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 163.81711 ± 148.337
2025-09-13 20:06:40,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [53.827377, 113.18058, 432.00317, 325.9097, 297.93195, 28.819866, 293.08743, 57.865456, 4.0886555, 31.457026]
2025-09-13 20:06:40,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [61.0, 170.0, 215.0, 194.0, 172.0, 53.0, 180.0, 71.0, 16.0, 43.0]
2025-09-13 20:06:40,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (163.82) for latency ExtremeSparseL4U32
2025-09-13 20:06:40,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 54 minutes, 10 seconds)
2025-09-13 20:17:15,699 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:17:15,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:17:46,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 148.43306 ± 163.704
2025-09-13 20:17:46,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [115.84512, 30.53673, 32.94475, 392.98642, 487.33813, 126.04403, 253.26974, 4.6121707, 15.206752, 25.546625]
2025-09-13 20:17:46,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [138.0, 40.0, 55.0, 211.0, 226.0, 102.0, 140.0, 18.0, 44.0, 40.0]
2025-09-13 20:17:46,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 40 minutes, 38 seconds)
2025-09-13 20:28:30,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:28:30,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:29:55,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 393.03143 ± 221.700
2025-09-13 20:29:55,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [504.3312, 406.6992, 704.13715, 4.680648, 4.1606207, 510.81012, 316.12112, 611.6688, 501.63556, 366.06992]
2025-09-13 20:29:55,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [347.0, 342.0, 551.0, 18.0, 15.0, 382.0, 255.0, 371.0, 354.0, 212.0]
2025-09-13 20:29:55,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (393.03) for latency ExtremeSparseL4U32
2025-09-13 20:29:55,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 47 minutes, 9 seconds)
2025-09-13 20:40:37,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:40:37,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:41:19,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 232.73347 ± 234.801
2025-09-13 20:41:19,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [401.99265, 606.92584, 7.4511323, 376.5066, 31.80107, 62.896664, 608.43164, 214.15268, 4.799515, 12.3769]
2025-09-13 20:41:19,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [218.0, 320.0, 18.0, 204.0, 43.0, 91.0, 327.0, 126.0, 29.0, 25.0]
2025-09-13 20:41:19,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 42 minutes, 12 seconds)
2025-09-13 20:52:10,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:52:10,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:52:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 209.46265 ± 199.405
2025-09-13 20:52:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [480.37802, 7.0602713, 227.34769, 55.614758, 467.20596, 427.65396, 12.150538, 18.122631, 385.95123, 13.1414385]
2025-09-13 20:52:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [244.0, 16.0, 164.0, 67.0, 322.0, 193.0, 22.0, 31.0, 204.0, 22.0]
2025-09-13 20:52:48,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 41 minutes, 14 seconds)
2025-09-13 21:03:30,723 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:03:30,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:04:02,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 132.30869 ± 131.904
2025-09-13 21:04:02,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [22.188072, 106.57369, 382.43054, 269.02338, 278.59915, 3.6912222, 2.6371195, 196.75676, 17.51905, 43.66796]
2025-09-13 21:04:02,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [35.0, 88.0, 209.0, 150.0, 324.0, 27.0, 25.0, 132.0, 34.0, 57.0]
2025-09-13 21:04:02,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 26 minutes, 41 seconds)
2025-09-13 21:14:47,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:14:47,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:15:41,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 242.78714 ± 190.579
2025-09-13 21:15:41,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [52.01884, 440.8761, 571.50037, 355.27606, 14.353932, 386.6635, 22.580387, 322.68628, 47.005817, 214.90994]
2025-09-13 21:15:41,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [89.0, 266.0, 380.0, 171.0, 34.0, 221.0, 32.0, 185.0, 91.0, 309.0]
2025-09-13 21:15:41,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 24 minutes, 42 seconds)
2025-09-13 21:26:45,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:26:45,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:27:20,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 116.17683 ± 136.811
2025-09-13 21:27:20,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [81.50521, 187.5221, 31.44289, 132.27081, 2.9326105, 12.024322, 0.7304353, 119.79784, 108.092804, 485.4494]
2025-09-13 21:27:20,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 191.0, 40.0, 155.0, 21.0, 19.0, 13.0, 112.0, 131.0, 374.0]
2025-09-13 21:27:20,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 4 minutes, 38 seconds)
2025-09-13 21:37:40,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:37:40,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:38:44,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 328.57095 ± 284.817
2025-09-13 21:38:44,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [106.170815, 466.9646, 42.29692, 877.879, 201.9259, 431.71228, 411.42676, 2.1622632, 700.4125, 44.758274]
2025-09-13 21:38:44,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 289.0, 53.0, 465.0, 173.0, 235.0, 261.0, 11.0, 413.0, 73.0]
2025-09-13 21:38:44,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 53 minutes, 13 seconds)
2025-09-13 21:49:28,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:49:28,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:50:05,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 139.88791 ± 141.751
2025-09-13 21:50:05,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [15.324533, 269.4858, 387.9824, 8.763192, 47.370415, 6.356595, 2.7806625, 98.102646, 319.5423, 243.17061]
2025-09-13 21:50:05,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 216.0, 294.0, 17.0, 76.0, 16.0, 15.0, 110.0, 284.0, 188.0]
2025-09-13 21:50:05,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 39 minutes, 22 seconds)
2025-09-13 22:00:55,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:00:55,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:02:10,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 444.42719 ± 401.175
2025-09-13 22:02:10,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.7463262, 805.0948, 579.9239, 46.678608, 310.79565, 120.263626, 6.051244, 767.2667, 536.4806, 1268.9708]
2025-09-13 22:02:10,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 347.0, 336.0, 62.0, 237.0, 145.0, 16.0, 411.0, 245.0, 695.0]
2025-09-13 22:02:10,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (444.43) for latency ExtremeSparseL4U32
2025-09-13 22:02:10,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 41 minutes, 40 seconds)
2025-09-13 22:13:03,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:13:03,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:14:02,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 393.84662 ± 264.725
2025-09-13 22:14:02,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [556.0099, 611.50323, 5.360221, 6.39595, 2.9282303, 535.8458, 563.8668, 692.67596, 580.36536, 383.5143]
2025-09-13 22:14:02,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [218.0, 313.0, 16.0, 15.0, 12.0, 305.0, 294.0, 345.0, 260.0, 164.0]
2025-09-13 22:14:02,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 33 minutes, 34 seconds)
2025-09-13 22:24:47,457 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:24:47,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:25:47,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 406.11981 ± 250.416
2025-09-13 22:25:47,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.9600382, 37.51725, 481.6849, 93.59889, 626.0659, 622.969, 438.0316, 728.39526, 503.48178, 527.4932]
2025-09-13 22:25:47,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 58.0, 224.0, 141.0, 263.0, 307.0, 184.0, 326.0, 210.0, 254.0]
2025-09-13 22:25:47,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 23 minutes, 22 seconds)
2025-09-13 22:36:18,703 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:36:18,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:37:11,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 332.12131 ± 295.949
2025-09-13 22:37:11,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.5791876, 754.05225, 775.23975, 17.44481, 600.3796, 137.57315, 56.14195, 361.4479, 78.09459, 538.2603]
2025-09-13 22:37:11,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 305.0, 363.0, 31.0, 257.0, 192.0, 60.0, 171.0, 103.0, 227.0]
2025-09-13 22:37:11,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 11 minutes, 42 seconds)
2025-09-13 22:47:56,737 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:47:56,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:49:01,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 462.92090 ± 237.444
2025-09-13 22:49:01,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1.2442987, 593.0382, 405.7045, 537.64716, 716.95447, 543.1918, 71.157585, 401.50226, 717.6001, 641.1685]
2025-09-13 22:49:01,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [10.0, 253.0, 178.0, 212.0, 364.0, 263.0, 99.0, 189.0, 330.0, 262.0]
2025-09-13 22:49:01,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (462.92) for latency ExtremeSparseL4U32
2025-09-13 22:49:01,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 7 minutes, 38 seconds)
2025-09-13 23:00:02,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:00:02,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:01:24,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 607.49268 ± 268.991
2025-09-13 23:01:24,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [518.29376, 928.1666, 9.533787, 851.6069, 560.6477, 695.53455, 519.8094, 924.92786, 351.28503, 715.12103]
2025-09-13 23:01:24,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 461.0, 19.0, 353.0, 239.0, 256.0, 310.0, 363.0, 171.0, 306.0]
2025-09-13 23:01:24,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (607.49) for latency ExtremeSparseL4U32
2025-09-13 23:01:24,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 22 seconds)
2025-09-13 23:11:54,997 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:11:55,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:12:50,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 330.76190 ± 205.355
2025-09-13 23:12:50,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [569.5576, 338.91495, 635.81464, 508.6857, 512.5815, 245.21959, 88.28466, 1.8965569, 210.51831, 196.14558]
2025-09-13 23:12:50,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 155.0, 285.0, 242.0, 200.0, 114.0, 117.0, 11.0, 245.0, 226.0]
2025-09-13 23:12:50,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 41 minutes, 51 seconds)
2025-09-13 23:23:34,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:23:34,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:24:49,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 555.23425 ± 258.686
2025-09-13 23:24:49,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [666.7557, 726.9639, 584.71155, 779.2581, 812.3211, 173.41452, 2.1421638, 588.7497, 448.54352, 769.4823]
2025-09-13 23:24:49,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [278.0, 297.0, 267.0, 322.0, 354.0, 205.0, 14.0, 250.0, 213.0, 322.0]
2025-09-13 23:24:49,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 33 minutes, 47 seconds)
2025-09-13 23:35:49,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:35:49,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:36:51,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 429.09320 ± 293.180
2025-09-13 23:36:51,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [94.5876, 9.55776, 736.9446, 239.26953, 5.0791097, 623.8499, 629.03625, 664.6918, 776.80743, 511.1083]
2025-09-13 23:36:51,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [106.0, 25.0, 372.0, 115.0, 15.0, 254.0, 263.0, 314.0, 363.0, 252.0]
2025-09-13 23:36:51,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 31 minutes, 10 seconds)
2025-09-13 23:47:29,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:47:29,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:48:15,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 322.98965 ± 277.733
2025-09-13 23:48:15,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [360.14688, 8.3169365, 4.6116767, 581.435, 15.564819, 5.6684017, 655.19525, 352.47168, 712.85583, 533.6298]
2025-09-13 23:48:15,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 23.0, 15.0, 268.0, 24.0, 14.0, 294.0, 172.0, 330.0, 227.0]
2025-09-13 23:48:15,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 12 minutes, 51 seconds)
2025-09-13 23:58:58,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:58:58,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:00:00,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 472.25031 ± 230.424
2025-09-14 00:00:00,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [59.14192, 599.58905, 659.4698, 614.90894, 606.7746, 529.3301, 587.7052, 14.798014, 385.60654, 665.17883]
2025-09-14 00:00:00,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 247.0, 308.0, 267.0, 254.0, 215.0, 215.0, 30.0, 185.0, 281.0]
2025-09-14 00:00:00,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 52 minutes, 11 seconds)
2025-09-14 00:10:51,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:10:51,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:11:37,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 307.91296 ± 271.620
2025-09-14 00:11:37,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [620.1219, 82.76262, 156.80618, 664.8859, 459.08163, 424.40964, 6.053093, 2.2751942, 1.7540219, 660.97943]
2025-09-14 00:11:37,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 95.0, 162.0, 273.0, 245.0, 177.0, 17.0, 13.0, 12.0, 311.0]
2025-09-14 00:11:37,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 43 minutes, 8 seconds)
2025-09-14 00:22:27,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:22:27,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:23:52,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 657.39954 ± 260.145
2025-09-14 00:23:52,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [755.37524, 646.4903, 582.18915, 890.8779, 751.61896, 745.2291, 1048.2983, 611.0185, 533.5622, 9.335326]
2025-09-14 00:23:52,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [397.0, 287.0, 227.0, 368.0, 305.0, 340.0, 446.0, 229.0, 235.0, 20.0]
2025-09-14 00:23:52,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (657.40) for latency ExtremeSparseL4U32
2025-09-14 00:23:52,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 34 minutes, 46 seconds)
2025-09-14 00:34:23,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:34:23,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:35:23,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 390.61310 ± 304.606
2025-09-14 00:35:23,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [873.6068, 276.96265, 230.39496, 3.4636607, 691.7618, 701.3514, 0.8818417, 694.7945, 299.41742, 133.4958]
2025-09-14 00:35:23,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 189.0, 118.0, 17.0, 290.0, 307.0, 28.0, 353.0, 143.0, 156.0]
2025-09-14 00:35:23,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 15 minutes, 59 seconds)
2025-09-14 00:46:08,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:46:08,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:47:34,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 685.24481 ± 326.414
2025-09-14 00:47:34,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1046.6521, 872.91, 13.386962, 569.6968, 812.2325, 1004.22437, 711.9979, 799.9289, 162.70314, 858.71564]
2025-09-14 00:47:34,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [455.0, 311.0, 27.0, 243.0, 337.0, 372.0, 316.0, 311.0, 130.0, 377.0]
2025-09-14 00:47:34,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (685.24) for latency ExtremeSparseL4U32
2025-09-14 00:47:34,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 14 minutes, 53 seconds)
2025-09-14 00:58:15,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:58:15,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:59:46,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 692.91486 ± 159.041
2025-09-14 00:59:46,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [821.54626, 364.47598, 683.60315, 783.53613, 578.6872, 530.71747, 860.8394, 760.9651, 639.337, 905.441]
2025-09-14 00:59:46,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [326.0, 180.0, 321.0, 312.0, 220.0, 247.0, 360.0, 329.0, 307.0, 366.0]
2025-09-14 00:59:46,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (692.91) for latency ExtremeSparseL4U32
2025-09-14 00:59:46,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 8 minutes, 44 seconds)
2025-09-14 01:10:40,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:10:40,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:12:09,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 648.73291 ± 243.642
2025-09-14 01:12:09,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [801.82806, 588.9981, 784.52856, 538.9517, 860.8991, 727.3367, 125.36435, 1027.8734, 389.47046, 642.07794]
2025-09-14 01:12:09,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 227.0, 319.0, 257.0, 326.0, 292.0, 164.0, 501.0, 206.0, 303.0]
2025-09-14 01:12:09,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 6 minutes, 47 seconds)
2025-09-14 01:22:43,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:22:43,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:24:03,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 607.48846 ± 333.116
2025-09-14 01:24:03,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [739.06586, 869.42773, 977.0875, 613.1563, 177.85396, 5.668302, 1010.0957, 869.2086, 527.227, 286.09344]
2025-09-14 01:24:03,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [346.0, 376.0, 402.0, 292.0, 109.0, 16.0, 360.0, 362.0, 276.0, 151.0]
2025-09-14 01:24:03,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 50 minutes, 22 seconds)
2025-09-14 01:34:48,299 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:34:48,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:35:56,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 503.99258 ± 317.795
2025-09-14 01:35:56,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [599.7689, 736.8852, 1055.9359, 665.0323, 41.352932, 4.652122, 269.07248, 321.64337, 613.00256, 732.58026]
2025-09-14 01:35:56,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 328.0, 504.0, 293.0, 50.0, 18.0, 142.0, 150.0, 233.0, 287.0]
2025-09-14 01:35:56,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 42 minutes, 56 seconds)
2025-09-14 01:46:48,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:46:48,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:48:38,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 844.66418 ± 485.772
2025-09-14 01:48:38,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1245.3141, 1477.2919, 815.95905, 1294.0317, 981.06274, 1079.9575, 554.7969, 992.0691, 4.2371054, 1.9216363]
2025-09-14 01:48:38,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [495.0, 599.0, 342.0, 633.0, 382.0, 478.0, 220.0, 421.0, 31.0, 25.0]
2025-09-14 01:48:38,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (844.66) for latency ExtremeSparseL4U32
2025-09-14 01:48:38,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 37 minutes, 13 seconds)
2025-09-14 01:59:20,910 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:59:20,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:00:53,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 780.92542 ± 360.737
2025-09-14 02:00:53,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [853.88617, 877.87604, 784.07043, 1039.3397, 16.622972, 658.7903, 896.11786, 712.65784, 1494.6262, 475.26678]
2025-09-14 02:00:53,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [342.0, 336.0, 303.0, 429.0, 31.0, 302.0, 323.0, 284.0, 532.0, 231.0]
2025-09-14 02:00:53,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 25 minutes, 43 seconds)
2025-09-14 02:11:40,093 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:11:40,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:12:54,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 575.29602 ± 418.242
2025-09-14 02:12:54,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [984.09625, 812.44904, 847.69824, 821.0114, 2.36914, 7.1888704, 1.1228591, 910.8091, 1053.8147, 312.40097]
2025-09-14 02:12:54,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [471.0, 305.0, 344.0, 327.0, 19.0, 23.0, 13.0, 389.0, 373.0, 185.0]
2025-09-14 02:12:54,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 9 minutes, 1 second)
2025-09-14 02:23:58,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:23:58,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:25:14,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 567.04333 ± 470.250
2025-09-14 02:25:14,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [28.494196, 1382.0288, 1155.7903, -1.6211731, 722.076, 593.4235, 589.4179, 961.0052, 178.28564, 61.53315]
2025-09-14 02:25:14,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 581.0, 449.0, 11.0, 278.0, 224.0, 253.0, 433.0, 117.0, 80.0]
2025-09-14 02:25:14,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 2 minutes, 1 second)
2025-09-14 02:35:47,742 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:35:47,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:36:58,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 558.01471 ± 253.520
2025-09-14 02:36:58,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [689.10986, 532.89954, -0.23998706, 524.1334, 863.1423, 603.9112, 926.34375, 295.39178, 495.8171, 649.638]
2025-09-14 02:36:58,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 227.0, 23.0, 223.0, 352.0, 245.0, 385.0, 135.0, 223.0, 262.0]
2025-09-14 02:36:58,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 48 minutes, 5 seconds)
2025-09-14 02:47:58,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:47:58,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:49:20,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 636.15588 ± 209.641
2025-09-14 02:49:20,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [642.76666, 727.862, 340.32553, 948.6794, 451.81845, 296.27966, 648.007, 938.8048, 691.73804, 675.2775]
2025-09-14 02:49:20,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 318.0, 169.0, 388.0, 193.0, 151.0, 262.0, 406.0, 278.0, 325.0]
2025-09-14 02:49:20,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 32 minutes, 1 second)
2025-09-14 02:59:53,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:59:53,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:01:41,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 870.31134 ± 260.109
2025-09-14 03:01:41,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1419.1509, 373.42346, 729.1417, 825.7908, 993.0463, 945.04504, 1015.8768, 663.8983, 973.95825, 763.7823]
2025-09-14 03:01:41,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [636.0, 171.0, 311.0, 325.0, 404.0, 365.0, 404.0, 277.0, 406.0, 303.0]
2025-09-14 03:01:41,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (870.31) for latency ExtremeSparseL4U32
2025-09-14 03:01:41,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 20 minutes, 57 seconds)
2025-09-14 03:12:31,214 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:12:31,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:13:34,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 473.17841 ± 371.520
2025-09-14 03:13:34,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [874.6206, 182.8634, 717.21924, 1.6077987, 15.227625, 12.10736, 638.35333, 812.92896, 1009.68994, 467.166]
2025-09-14 03:13:34,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [373.0, 110.0, 271.0, 14.0, 29.0, 29.0, 294.0, 326.0, 418.0, 196.0]
2025-09-14 03:13:34,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 7 minutes, 15 seconds)
2025-09-14 03:24:33,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:24:33,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:26:11,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 788.43274 ± 299.402
2025-09-14 03:26:11,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [602.8756, 1004.3599, 1011.6193, 396.06726, 844.971, 1012.3427, 1048.1748, 155.79718, 725.40894, 1082.7103]
2025-09-14 03:26:11,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [241.0, 391.0, 426.0, 175.0, 371.0, 375.0, 486.0, 113.0, 295.0, 383.0]
2025-09-14 03:26:11,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 58 minutes, 16 seconds)
2025-09-14 03:36:49,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:36:49,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:38:25,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 760.15332 ± 233.713
2025-09-14 03:38:25,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [192.8619, 1010.16974, 661.86487, 704.6427, 1046.8842, 962.76, 711.8496, 647.2231, 814.6423, 848.6351]
2025-09-14 03:38:25,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 383.0, 280.0, 330.0, 380.0, 388.0, 274.0, 294.0, 313.0, 367.0]
2025-09-14 03:38:25,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 51 minutes, 15 seconds)
2025-09-14 03:49:15,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:49:15,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:50:49,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 757.67352 ± 196.275
2025-09-14 03:50:49,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [243.62929, 791.82513, 978.04205, 768.9134, 703.782, 747.04034, 728.62085, 741.0275, 973.9367, 899.918]
2025-09-14 03:50:49,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 313.0, 408.0, 321.0, 284.0, 299.0, 296.0, 302.0, 364.0, 348.0]
2025-09-14 03:50:49,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 39 minutes, 22 seconds)
2025-09-14 04:01:47,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:01:47,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:03:17,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 753.25867 ± 204.646
2025-09-14 04:03:17,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [731.39874, 781.5782, 678.06635, 1055.6539, 607.786, 392.32065, 741.999, 1049.3818, 545.8365, 948.56573]
2025-09-14 04:03:17,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [258.0, 371.0, 279.0, 366.0, 215.0, 163.0, 288.0, 473.0, 212.0, 363.0]
2025-09-14 04:03:17,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 28 minutes, 16 seconds)
2025-09-14 04:13:53,863 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:13:53,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:14:48,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 416.45532 ± 345.514
2025-09-14 04:14:48,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [222.05739, 2.1045399, 3.2412846, 788.8871, 717.5508, 729.6273, 873.68066, 643.95886, 6.8094296, 176.63603]
2025-09-14 04:14:48,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 15.0, 18.0, 300.0, 269.0, 252.0, 326.0, 295.0, 20.0, 112.0]
2025-09-14 04:14:48,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 12 minutes, 21 seconds)
2025-09-14 04:25:27,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:25:27,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:27:02,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 721.12854 ± 157.225
2025-09-14 04:27:02,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [448.13736, 626.54974, 816.8068, 792.79834, 698.72516, 1029.2622, 544.0395, 818.575, 637.2158, 799.17523]
2025-09-14 04:27:02,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 270.0, 524.0, 285.0, 319.0, 420.0, 212.0, 374.0, 264.0, 313.0]
2025-09-14 04:27:02,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 56 minutes, 22 seconds)
2025-09-14 04:37:53,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:37:53,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:39:11,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 654.53503 ± 254.309
2025-09-14 04:39:11,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [431.15082, 708.03534, 967.1564, 739.75104, 1078.3782, 522.62476, 189.51736, 612.80835, 455.7876, 840.1401]
2025-09-14 04:39:11,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 287.0, 344.0, 325.0, 380.0, 191.0, 127.0, 261.0, 212.0, 308.0]
2025-09-14 04:39:11,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 43 minutes, 23 seconds)
2025-09-14 04:50:06,884 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:50:06,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:51:18,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 597.84460 ± 427.273
2025-09-14 04:51:18,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1182.8165, 749.3407, 931.7188, 412.967, 869.6394, 939.70215, 872.41296, 9.468626, 9.305183, 1.0747072]
2025-09-14 04:51:18,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [442.0, 279.0, 320.0, 200.0, 373.0, 388.0, 334.0, 22.0, 25.0, 23.0]
2025-09-14 04:51:18,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 28 minutes, 34 seconds)
2025-09-14 05:02:09,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:02:09,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:03:55,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 885.11218 ± 133.957
2025-09-14 05:03:55,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [859.3408, 741.5631, 1035.5197, 761.9829, 1054.7567, 982.27826, 816.45544, 1088.5231, 790.0265, 720.6758]
2025-09-14 05:03:55,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [337.0, 287.0, 422.0, 286.0, 406.0, 374.0, 354.0, 377.0, 342.0, 267.0]
2025-09-14 05:03:55,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (885.11) for latency ExtremeSparseL4U32
2025-09-14 05:03:55,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 17 minutes, 47 seconds)
2025-09-14 05:14:39,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:14:39,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:15:15,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 266.00156 ± 308.923
2025-09-14 05:15:15,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [522.385, 940.33575, 352.1887, 559.6145, 1.6497523, 6.948321, 3.7904723, 8.590368, 4.861695, 259.65085]
2025-09-14 05:15:15,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [235.0, 337.0, 174.0, 242.0, 17.0, 22.0, 15.0, 26.0, 18.0, 132.0]
2025-09-14 05:15:15,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 4 minutes, 4 seconds)
2025-09-14 05:26:09,309 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:26:09,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:27:30,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 702.14526 ± 271.795
2025-09-14 05:27:30,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [786.44666, 503.2626, 874.07434, 883.3041, 721.81976, 834.98425, 7.5055337, 634.40234, 1054.5376, 721.1151]
2025-09-14 05:27:30,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [308.0, 210.0, 322.0, 348.0, 250.0, 333.0, 15.0, 271.0, 408.0, 283.0]
2025-09-14 05:27:30,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 52 minutes, 5 seconds)
2025-09-14 05:38:24,020 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:38:24,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:40:14,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 957.99823 ± 186.040
2025-09-14 05:40:14,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [601.35297, 1066.2694, 828.69666, 904.0887, 1015.2404, 760.2954, 1054.9602, 1048.5682, 1314.0708, 986.4391]
2025-09-14 05:40:14,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 398.0, 326.0, 360.0, 346.0, 314.0, 362.0, 384.0, 516.0, 405.0]
2025-09-14 05:40:14,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (958.00) for latency ExtremeSparseL4U32
2025-09-14 05:40:14,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 45 minutes, 1 second)
2025-09-14 05:50:52,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:50:52,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:52:35,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 899.74609 ± 151.132
2025-09-14 05:52:35,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [642.98456, 673.7244, 985.7414, 968.5895, 913.2295, 958.61035, 1112.3964, 796.42206, 1093.3418, 852.4204]
2025-09-14 05:52:35,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [255.0, 286.0, 376.0, 377.0, 321.0, 352.0, 393.0, 323.0, 385.0, 310.0]
2025-09-14 05:52:35,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 34 minutes, 41 seconds)
2025-09-14 06:03:34,953 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:03:34,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:05:06,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 768.83472 ± 379.849
2025-09-14 06:05:06,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1024.7355, 332.43982, 605.7248, 729.05475, 1453.8073, 858.84576, 885.07306, 1068.294, 720.4331, 9.93947]
2025-09-14 06:05:06,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [404.0, 128.0, 245.0, 272.0, 542.0, 341.0, 351.0, 395.0, 283.0, 41.0]
2025-09-14 06:05:06,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 21 minutes, 41 seconds)
2025-09-14 06:15:28,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:15:28,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:16:35,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 563.59790 ± 373.667
2025-09-14 06:16:35,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [907.6349, 11.742427, -1.6898679, 698.04565, 689.5712, 831.2178, 8.48942, 784.0963, 745.1061, 961.76526]
2025-09-14 06:16:35,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [331.0, 26.0, 21.0, 289.0, 291.0, 296.0, 26.0, 312.0, 311.0, 341.0]
2025-09-14 06:16:35,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 10 minutes, 40 seconds)
2025-09-14 06:27:24,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:27:24,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:29:05,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 905.43005 ± 220.764
2025-09-14 06:29:05,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [984.07996, 345.15408, 1118.0806, 975.11676, 860.79095, 821.81006, 1140.0443, 1112.7966, 805.6757, 890.7518]
2025-09-14 06:29:05,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [352.0, 141.0, 417.0, 359.0, 342.0, 319.0, 405.0, 389.0, 305.0, 346.0]
2025-09-14 06:29:05,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 19 seconds)
2025-09-14 06:40:04,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:40:04,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:41:28,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 699.09460 ± 286.275
2025-09-14 06:41:28,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [775.2288, 616.3481, 541.6456, 1031.6178, 668.40094, 4.7188406, 787.92126, 1112.7247, 789.34045, 662.99945]
2025-09-14 06:41:28,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [296.0, 223.0, 223.0, 372.0, 303.0, 18.0, 366.0, 392.0, 334.0, 285.0]
2025-09-14 06:41:28,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 45 minutes, 26 seconds)
2025-09-14 06:52:16,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:52:16,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:53:38,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 652.90985 ± 393.852
2025-09-14 06:53:38,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [786.9682, 752.54095, 734.81415, 464.7949, 5.7200856, 947.06226, 822.6797, 3.9602773, 1377.792, 632.766]
2025-09-14 06:53:38,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 267.0, 262.0, 206.0, 25.0, 361.0, 319.0, 14.0, 722.0, 239.0]
2025-09-14 06:53:38,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 31 minutes, 49 seconds)
2025-09-14 07:04:21,097 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:04:21,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:05:39,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 656.30957 ± 323.316
2025-09-14 07:05:39,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [4.2692637, 679.7176, 1161.3204, 950.68335, 578.33325, 614.64935, 230.78238, 762.36975, 645.9258, 935.04456]
2025-09-14 07:05:39,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 298.0, 405.0, 368.0, 217.0, 243.0, 180.0, 281.0, 253.0, 359.0]
2025-09-14 07:05:39,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 15 minutes, 59 seconds)
2025-09-14 07:16:21,820 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:16:21,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:17:50,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 687.87164 ± 261.368
2025-09-14 07:17:50,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [602.0724, 368.27008, 800.46326, 838.32574, 760.29767, 706.56195, 885.03864, 911.0858, 69.77583, 936.82465]
2025-09-14 07:17:50,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 319.0, 294.0, 332.0, 318.0, 263.0, 344.0, 354.0, 121.0, 374.0]
2025-09-14 07:17:51,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 8 minutes, 48 seconds)
2025-09-14 07:28:51,163 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:28:51,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:30:25,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 814.31555 ± 477.876
2025-09-14 07:30:25,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1032.672, 1018.65045, 3.6085749, 6.5754247, 1063.4275, 1437.6221, 1021.2209, 1187.626, 1016.6348, 355.1171]
2025-09-14 07:30:25,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [382.0, 370.0, 17.0, 25.0, 474.0, 517.0, 379.0, 433.0, 401.0, 151.0]
2025-09-14 07:30:25,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 57 minutes, 5 seconds)
2025-09-14 07:41:03,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:41:03,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:42:23,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 731.33282 ± 275.106
2025-09-14 07:42:23,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [857.81616, 723.8175, 559.78143, 8.645936, 794.0744, 691.6101, 946.2631, 803.7854, 1073.4509, 854.0831]
2025-09-14 07:42:23,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 246.0, 218.0, 27.0, 264.0, 247.0, 320.0, 286.0, 440.0, 308.0]
2025-09-14 07:42:23,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 42 minutes, 3 seconds)
2025-09-14 07:53:32,013 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:53:32,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:54:56,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 752.73883 ± 508.093
2025-09-14 07:54:56,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [0.5484402, 1.782035, 1560.0189, 867.44086, 1021.69244, 919.2591, 718.56055, 188.8237, 911.1385, 1338.1237]
2025-09-14 07:54:56,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 13.0, 524.0, 334.0, 361.0, 338.0, 260.0, 99.0, 340.0, 492.0]
2025-09-14 07:54:56,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 32 minutes, 16 seconds)
2025-09-14 08:05:30,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:05:30,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:07:02,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 832.21552 ± 317.932
2025-09-14 08:07:02,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [956.12634, 760.7474, 675.5643, 1088.9526, 843.98895, 966.301, 876.20624, 1278.4116, 867.9423, 7.914185]
2025-09-14 08:07:02,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [356.0, 318.0, 276.0, 380.0, 311.0, 387.0, 310.0, 420.0, 300.0, 32.0]
2025-09-14 08:07:02,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 20 minutes, 37 seconds)
2025-09-14 08:17:56,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:17:56,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:19:21,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 737.58276 ± 299.072
2025-09-14 08:19:21,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [988.7658, 296.3596, 931.3116, 921.41705, 1231.0371, 200.65831, 597.463, 646.3801, 811.1954, 751.23987]
2025-09-14 08:19:21,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [369.0, 142.0, 338.0, 329.0, 396.0, 104.0, 256.0, 254.0, 310.0, 336.0]
2025-09-14 08:19:21,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 9 minutes, 1 second)
2025-09-14 08:30:01,179 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:30:01,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:31:49,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 949.51727 ± 140.396
2025-09-14 08:31:49,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [711.57764, 964.3555, 1100.255, 1015.24347, 1003.26917, 1124.699, 1030.5363, 933.39044, 680.24146, 931.6047]
2025-09-14 08:31:49,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 401.0, 385.0, 365.0, 355.0, 401.0, 394.0, 343.0, 251.0, 385.0]
2025-09-14 08:31:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 56 minutes, 5 seconds)
2025-09-14 08:42:34,312 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:42:34,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:44:07,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 822.55060 ± 235.896
2025-09-14 08:44:07,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [649.678, 231.68387, 908.8109, 945.34735, 1047.7253, 893.6764, 940.6013, 661.6431, 889.5671, 1056.7732]
2025-09-14 08:44:07,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [276.0, 110.0, 370.0, 417.0, 368.0, 327.0, 327.0, 250.0, 303.0, 363.0]
2025-09-14 08:44:07,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 45 minutes, 38 seconds)
2025-09-14 08:55:04,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:55:04,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:56:35,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 824.73645 ± 244.097
2025-09-14 08:56:35,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [785.58453, 1047.8196, 694.37036, 231.70187, 944.3599, 848.58386, 850.4768, 789.1085, 1223.4669, 831.8913]
2025-09-14 08:56:35,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 364.0, 260.0, 109.0, 335.0, 329.0, 286.0, 278.0, 412.0, 352.0]
2025-09-14 08:56:35,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 32 minutes, 55 seconds)
2025-09-14 09:07:25,005 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:07:25,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:09:01,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 896.94659 ± 123.277
2025-09-14 09:09:01,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [827.3006, 827.96155, 822.7844, 983.47736, 936.2461, 890.38885, 866.19543, 1014.8832, 662.50854, 1137.7197]
2025-09-14 09:09:01,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [289.0, 287.0, 280.0, 392.0, 349.0, 306.0, 307.0, 379.0, 241.0, 407.0]
2025-09-14 09:09:01,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 22 minutes, 18 seconds)
2025-09-14 09:19:39,198 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:19:39,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:20:54,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 653.62061 ± 441.413
2025-09-14 09:20:54,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [767.45215, 1101.2899, 1137.5399, 975.0699, 685.8373, 969.3531, 8.256552, 6.543921, 873.9724, 10.890863]
2025-09-14 09:20:54,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [295.0, 406.0, 440.0, 344.0, 258.0, 341.0, 20.0, 19.0, 342.0, 30.0]
2025-09-14 09:20:54,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 7 minutes, 44 seconds)
2025-09-14 09:31:45,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:31:45,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:33:12,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 818.31665 ± 172.643
2025-09-14 09:33:12,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [748.7286, 817.6903, 571.77045, 930.782, 673.6735, 621.8355, 882.8854, 966.4088, 1183.1749, 786.21747]
2025-09-14 09:33:12,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [288.0, 265.0, 198.0, 327.0, 252.0, 238.0, 314.0, 338.0, 399.0, 285.0]
2025-09-14 09:33:12,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 54 minutes, 40 seconds)
2025-09-14 09:43:54,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:43:54,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:44:56,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 509.26831 ± 433.824
2025-09-14 09:44:56,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1061.3406, 2.106183, 0.6702185, 983.238, 856.7938, 3.6328518, 13.132335, 782.3236, 506.60098, 882.8449]
2025-09-14 09:44:56,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [480.0, 15.0, 12.0, 360.0, 290.0, 19.0, 23.0, 276.0, 188.0, 382.0]
2025-09-14 09:44:56,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 39 minutes, 44 seconds)
2025-09-14 09:56:02,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:56:02,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:57:09,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 589.70465 ± 419.546
2025-09-14 09:57:09,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [438.85962, 1042.0221, 907.8392, 967.28217, 1027.7534, 826.21716, 1.5716375, 4.363729, 678.6388, 2.4984589]
2025-09-14 09:57:09,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 358.0, 322.0, 356.0, 357.0, 317.0, 22.0, 17.0, 322.0, 17.0]
2025-09-14 09:57:09,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 26 minutes, 30 seconds)
2025-09-14 10:08:02,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:08:02,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:09:38,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 895.69708 ± 438.285
2025-09-14 10:09:38,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1173.2126, 194.03558, 1215.0375, 311.59467, 1228.9071, 1198.7145, 1207.9121, 1265.031, 211.15178, 951.3741]
2025-09-14 10:09:38,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [435.0, 99.0, 386.0, 130.0, 422.0, 399.0, 434.0, 418.0, 102.0, 342.0]
2025-09-14 10:09:38,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 14 minutes, 34 seconds)
2025-09-14 10:20:08,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:20:08,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:21:52,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 972.92316 ± 513.571
2025-09-14 10:21:52,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1462.051, 1238.3584, -1.343713, 1000.84656, 1771.6396, 831.73596, 836.3312, 1356.4794, 1004.0223, 229.11066]
2025-09-14 10:21:52,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [481.0, 412.0, 11.0, 347.0, 561.0, 329.0, 302.0, 486.0, 382.0, 103.0]
2025-09-14 10:21:52,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (972.92) for latency ExtremeSparseL4U32
2025-09-14 10:21:52,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 3 minutes, 54 seconds)
2025-09-14 10:32:54,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:32:54,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:34:28,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 883.60046 ± 372.422
2025-09-14 10:34:28,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1081.9902, 496.6009, 844.9741, 1272.008, 929.62305, 784.5589, 1035.3096, 1009.7248, 1368.8999, 12.315824]
2025-09-14 10:34:28,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [360.0, 185.0, 338.0, 418.0, 344.0, 291.0, 356.0, 343.0, 488.0, 26.0]
2025-09-14 10:34:28,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 52 minutes, 45 seconds)
2025-09-14 10:45:08,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:45:08,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:46:52,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 928.13831 ± 486.037
2025-09-14 10:46:52,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1818.0438, 892.88824, 1336.0488, 611.8175, 43.078648, 335.14172, 1021.29083, 1322.4327, 914.36383, 986.2768]
2025-09-14 10:46:52,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [648.0, 325.0, 476.0, 231.0, 71.0, 165.0, 352.0, 486.0, 331.0, 350.0]
2025-09-14 10:46:52,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 43 minutes)
2025-09-14 10:57:40,737 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:57:40,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:58:42,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 540.41663 ± 482.912
2025-09-14 10:58:42,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [658.57825, 634.3534, 624.7079, 1123.2036, 1037.889, 1298.8575, 0.23514582, 15.359615, 8.858587, 2.1234596]
2025-09-14 10:58:42,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 261.0, 239.0, 409.0, 380.0, 453.0, 13.0, 27.0, 32.0, 14.0]
2025-09-14 10:58:42,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 29 minutes, 16 seconds)
2025-09-14 11:09:38,634 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:09:38,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:10:57,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 724.42841 ± 516.682
2025-09-14 11:10:57,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [631.35754, 1119.6, 883.3672, 796.8794, 774.51904, 5.626201, 0.791777, 235.74951, 1775.488, 1020.90515]
2025-09-14 11:10:57,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 360.0, 376.0, 303.0, 280.0, 17.0, 15.0, 104.0, 582.0, 343.0]
2025-09-14 11:10:57,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 16 minutes, 14 seconds)
2025-09-14 11:21:32,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:21:32,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:23:04,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 730.34802 ± 424.320
2025-09-14 11:23:04,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1011.2879, 1149.4193, 4.600099, 1333.5391, 887.7974, 326.5215, 1033.3292, 187.82454, 488.1224, 881.03876]
2025-09-14 11:23:04,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [353.0, 402.0, 15.0, 514.0, 328.0, 142.0, 372.0, 190.0, 350.0, 337.0]
2025-09-14 11:23:04,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 3 minutes, 33 seconds)
2025-09-14 11:34:07,098 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:34:07,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:35:44,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 902.14435 ± 319.650
2025-09-14 11:35:44,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1374.489, 947.82666, 845.5623, 1056.9856, 1144.2197, 150.65283, 938.1826, 766.7981, 1149.7598, 646.9663]
2025-09-14 11:35:44,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [459.0, 332.0, 285.0, 363.0, 418.0, 80.0, 335.0, 286.0, 386.0, 290.0]
2025-09-14 11:35:44,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 51 minutes, 34 seconds)
2025-09-14 11:46:35,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:46:35,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:48:20,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1021.54901 ± 235.847
2025-09-14 11:48:20,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1190.8228, 987.5598, 926.0893, 799.2456, 787.80054, 792.481, 1172.6805, 772.9383, 1381.0159, 1404.8566]
2025-09-14 11:48:20,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [383.0, 354.0, 315.0, 289.0, 314.0, 275.0, 391.0, 274.0, 443.0, 473.0]
2025-09-14 11:48:20,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1021.55) for latency ExtremeSparseL4U32
2025-09-14 11:48:20,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 39 minutes, 48 seconds)
2025-09-14 11:59:08,983 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:59:08,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:00:41,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 878.95886 ± 330.595
2025-09-14 12:00:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1134.1792, 954.15186, 701.57697, 1172.0381, 787.81714, 160.35802, 1083.6537, 984.3346, 502.2277, 1309.2516]
2025-09-14 12:00:41,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [372.0, 314.0, 251.0, 387.0, 284.0, 88.0, 379.0, 344.0, 180.0, 429.0]
2025-09-14 12:00:41,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 28 minutes, 45 seconds)
2025-09-14 12:11:30,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:11:30,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:13:19,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 988.88586 ± 399.240
2025-09-14 12:13:19,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1377.0327, 1615.5122, 740.92395, 762.8368, 236.10324, 738.99023, 934.348, 1160.0579, 832.1264, 1490.9276]
2025-09-14 12:13:19,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [481.0, 559.0, 248.0, 335.0, 118.0, 277.0, 319.0, 444.0, 291.0, 535.0]
2025-09-14 12:13:19,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 17 minutes, 11 seconds)
2025-09-14 12:24:05,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:24:05,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:25:48,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 977.56677 ± 193.997
2025-09-14 12:25:48,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [928.0481, 1074.8531, 689.95404, 832.54626, 1300.5039, 907.39343, 999.2753, 964.06445, 771.9893, 1307.04]
2025-09-14 12:25:48,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [332.0, 417.0, 282.0, 304.0, 413.0, 325.0, 334.0, 323.0, 264.0, 432.0]
2025-09-14 12:25:48,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 5 minutes, 29 seconds)
2025-09-14 12:36:35,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:36:35,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:37:56,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 766.21014 ± 548.285
2025-09-14 12:37:56,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1412.8898, 1407.6859, 3.7141967, 2.0011563, 2.850578, 638.7761, 1180.8599, 788.575, 1109.5367, 1115.2123]
2025-09-14 12:37:56,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [478.0, 456.0, 16.0, 11.0, 26.0, 233.0, 397.0, 289.0, 401.0, 392.0]
2025-09-14 12:37:56,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 51 minutes, 57 seconds)
2025-09-14 12:48:53,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:48:53,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:49:53,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 558.46155 ± 567.299
2025-09-14 12:49:53,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.26766312, 1487.1166, 376.1138, 1090.1685, 622.26794, 534.3285, 1456.9736, 3.55525, 10.535676, 3.8228247]
2025-09-14 12:49:53,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 490.0, 151.0, 355.0, 233.0, 211.0, 471.0, 27.0, 26.0, 15.0]
2025-09-14 12:49:53,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 38 minutes, 28 seconds)
2025-09-14 13:00:31,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:00:31,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:01:48,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 696.03430 ± 491.211
2025-09-14 13:01:48,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1097.9581, 5.056968, 952.9801, 869.6394, 600.3387, 1249.9851, 889.71313, -1.3603863, 2.5600238, 1293.4711]
2025-09-14 13:01:48,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [363.0, 21.0, 341.0, 304.0, 255.0, 492.0, 302.0, 14.0, 23.0, 444.0]
2025-09-14 13:01:48,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 25 minutes, 32 seconds)
2025-09-14 13:12:36,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:12:36,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:14:35,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1149.81543 ± 345.439
2025-09-14 13:14:35,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1742.5839, 1286.9885, 1575.5938, 950.8156, 1024.8525, 1088.1101, 1501.243, 865.9515, 849.1111, 612.9045]
2025-09-14 13:14:35,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [545.0, 458.0, 516.0, 334.0, 351.0, 410.0, 502.0, 322.0, 312.0, 215.0]
2025-09-14 13:14:35,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1149.82) for latency ExtremeSparseL4U32
2025-09-14 13:14:35,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 13 minutes, 30 seconds)
2025-09-14 13:25:45,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:25:45,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:27:10,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 781.36243 ± 306.346
2025-09-14 13:27:10,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [709.265, 883.9875, 1102.795, 256.74695, 166.5186, 897.8339, 1039.2981, 1053.7258, 883.32935, 820.1247]
2025-09-14 13:27:10,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 307.0, 379.0, 119.0, 92.0, 324.0, 390.0, 364.0, 294.0, 308.0]
2025-09-14 13:27:10,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 1 minute, 21 seconds)
2025-09-14 13:37:43,762 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:37:43,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:39:11,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 807.14636 ± 392.151
2025-09-14 13:39:11,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1032.9851, 5.122996, 849.1443, 1098.5188, 1007.0069, 121.176186, 876.83124, 1089.5929, 782.46594, 1208.6198]
2025-09-14 13:39:11,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [340.0, 29.0, 312.0, 359.0, 348.0, 65.0, 368.0, 364.0, 303.0, 416.0]
2025-09-14 13:39:11,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 48 minutes, 59 seconds)
2025-09-14 13:49:53,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:49:53,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:51:13,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 722.68774 ± 520.318
2025-09-14 13:51:13,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1293.98, 501.9344, 843.2833, 1044.4064, -1.538255, 3.6975768, 1328.9365, 1058.2141, 1147.1268, 6.836506]
2025-09-14 13:51:13,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [423.0, 259.0, 312.0, 343.0, 10.0, 26.0, 513.0, 361.0, 389.0, 29.0]
2025-09-14 13:51:13,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 36 minutes, 48 seconds)
2025-09-14 14:02:13,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:02:13,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:04:10,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1110.83508 ± 204.764
2025-09-14 14:04:10,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1083.7391, 1010.02747, 1409.7438, 1214.3413, 965.0673, 901.74554, 969.9766, 877.6518, 1519.563, 1156.4951]
2025-09-14 14:04:10,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [366.0, 350.0, 481.0, 429.0, 324.0, 307.0, 424.0, 293.0, 502.0, 370.0]
2025-09-14 14:04:10,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 56 seconds)
2025-09-14 14:14:53,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:14:53,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:16:34,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 958.02441 ± 298.309
2025-09-14 14:16:34,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [970.1187, 748.70465, 1328.2657, 228.14891, 968.0048, 1081.5647, 1159.3945, 892.0112, 909.15936, 1294.8711]
2025-09-14 14:16:34,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 278.0, 446.0, 127.0, 329.0, 393.0, 389.0, 302.0, 318.0, 440.0]
2025-09-14 14:16:34,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 23 seconds)
2025-09-14 14:27:06,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:27:06,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:28:31,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 765.25745 ± 386.180
2025-09-14 14:28:31,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1042.8439, 1314.7338, 1202.481, 1111.8708, 352.1136, 194.07082, 685.98413, 479.85876, 936.04913, 332.56824]
2025-09-14 14:28:31,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [335.0, 454.0, 374.0, 384.0, 143.0, 104.0, 249.0, 220.0, 380.0, 154.0]
2025-09-14 14:28:31,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1251 [DEBUG]: Training session finished
