2025-09-13 18:23:08,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:23:08,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:23:08,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1466489c5510>}
2025-09-13 18:23:08,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1111 [DEBUG]: using device: cuda
2025-09-13 18:23:08,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1133 [INFO]: Creating new trainer
2025-09-13 18:23:09,013 baseline-mbpac-noiseperc15-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-13 18:23:09,013 baseline-mbpac-noiseperc15-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 18:23:09,021 baseline-mbpac-noiseperc15-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 18:23:10,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1194 [DEBUG]: Starting training session...
2025-09-13 18:23:10,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 1/100
2025-09-13 18:34:08,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:34:08,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:35:17,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 306.90692 ± 103.191
2025-09-13 18:35:17,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [321.37567, 244.1625, 231.3604, 301.22217, 125.83298, 343.9698, 499.07806, 458.59354, 268.19983, 275.27423]
2025-09-13 18:35:17,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 146.0, 138.0, 211.0, 270.0, 216.0, 407.0, 367.0, 170.0, 163.0]
2025-09-13 18:35:17,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (306.91) for latency ExtremeSparseL4U32
2025-09-13 18:35:17,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 7 seconds)
2025-09-13 18:46:07,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:46:07,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:46:57,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 145.56268 ± 100.371
2025-09-13 18:46:57,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [45.905502, 23.131968, 144.0633, 150.04745, 182.14757, 287.01743, 3.9655743, 164.8033, 328.95795, 125.58682]
2025-09-13 18:46:57,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 45.0, 159.0, 269.0, 192.0, 162.0, 14.0, 166.0, 218.0, 248.0]
2025-09-13 18:46:57,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 25 minutes, 23 seconds)
2025-09-13 18:57:45,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:57:45,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:58:14,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 48.43324 ± 90.174
2025-09-13 18:58:14,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [34.446297, 18.50096, 12.075007, 14.678183, 9.609369, 36.269684, 313.9277, -23.00913, 40.73065, 27.103708]
2025-09-13 18:58:14,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [60.0, 47.0, 29.0, 85.0, 74.0, 163.0, 242.0, 163.0, 74.0, 44.0]
2025-09-13 18:58:14,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 54 minutes, 6 seconds)
2025-09-13 19:08:48,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:08:48,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:09:39,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 160.47209 ± 123.198
2025-09-13 19:09:39,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [338.42987, 246.91751, 53.19146, 225.99652, 119.48867, 15.876102, 26.306551, 9.7188, 246.96439, 321.83105]
2025-09-13 19:09:39,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [433.0, 207.0, 110.0, 142.0, 118.0, 61.0, 163.0, 44.0, 150.0, 290.0]
2025-09-13 19:09:39,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 35 minutes, 49 seconds)
2025-09-13 19:20:19,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:20:19,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:20:53,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 70.44646 ± 32.281
2025-09-13 19:20:53,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [61.418365, 37.498962, 80.917114, 59.03565, 84.22118, 69.69758, 21.894709, 79.06749, 149.8219, 60.8917]
2025-09-13 19:20:53,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 68.0, 122.0, 105.0, 125.0, 125.0, 56.0, 147.0, 173.0, 65.0]
2025-09-13 19:20:53,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 16 minutes, 43 seconds)
2025-09-13 19:31:31,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:31:31,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:32:02,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 100.72173 ± 95.149
2025-09-13 19:32:02,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [107.15486, 46.47044, 27.273571, 67.89226, 57.310715, 335.21634, 2.2094076, 104.12658, 209.77826, 49.78478]
2025-09-13 19:32:02,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 59.0, 37.0, 175.0, 80.0, 219.0, 14.0, 136.0, 133.0, 70.0]
2025-09-13 19:32:02,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 46 minutes, 55 seconds)
2025-09-13 19:42:36,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:42:36,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:43:10,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 134.64241 ± 112.863
2025-09-13 19:43:10,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [271.15756, 50.258858, 375.38953, 26.170856, 66.07717, 164.25859, 48.24161, 222.60074, 73.62185, 48.647312]
2025-09-13 19:43:10,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 77.0, 238.0, 40.0, 102.0, 169.0, 57.0, 131.0, 116.0, 76.0]
2025-09-13 19:43:10,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 25 minutes, 53 seconds)
2025-09-13 19:53:55,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:53:55,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:54:18,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 73.08662 ± 76.447
2025-09-13 19:54:18,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [244.73918, 7.69226, 70.648155, 59.562397, 86.11224, 44.983315, 8.862838, 10.132057, 15.05873, 183.075]
2025-09-13 19:54:18,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 38.0, 81.0, 103.0, 100.0, 58.0, 37.0, 21.0, 30.0, 144.0]
2025-09-13 19:54:18,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 11 minutes, 37 seconds)
2025-09-13 20:04:50,860 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:04:50,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:05:34,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 179.97134 ± 134.914
2025-09-13 20:05:34,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [102.55909, 117.84276, 22.760061, 506.5493, 269.32034, 196.00696, 50.19179, 77.4017, 243.95961, 213.12196]
2025-09-13 20:05:34,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 238.0, 43.0, 282.0, 174.0, 122.0, 71.0, 75.0, 162.0, 121.0]
2025-09-13 20:05:34,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 57 minutes, 32 seconds)
2025-09-13 20:16:13,810 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:16:13,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:17:02,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 209.35071 ± 167.838
2025-09-13 20:17:02,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [276.9899, 367.16235, 31.134565, 84.22015, 37.13942, 274.85583, 343.54776, 30.351252, 541.4077, 106.69819]
2025-09-13 20:17:02,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 324.0, 38.0, 120.0, 66.0, 172.0, 261.0, 79.0, 285.0, 138.0]
2025-09-13 20:17:02,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 50 minutes, 39 seconds)
2025-09-13 20:27:59,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:27:59,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:28:54,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 239.46094 ± 280.432
2025-09-13 20:28:54,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [40.979134, 258.32437, 39.40419, 505.99728, 15.50946, 567.4841, 43.62796, 840.2315, 51.97893, 31.072235]
2025-09-13 20:28:54,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [56.0, 199.0, 51.0, 357.0, 25.0, 402.0, 90.0, 589.0, 72.0, 45.0]
2025-09-13 20:28:54,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 52 minutes, 13 seconds)
2025-09-13 20:39:20,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:39:20,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:40:01,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 217.28694 ± 187.325
2025-09-13 20:40:01,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [274.542, 422.8373, 501.44977, 66.84986, 74.64067, 21.644634, 458.79254, 308.78214, 30.052546, 13.277948]
2025-09-13 20:40:01,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [163.0, 244.0, 229.0, 81.0, 56.0, 34.0, 322.0, 167.0, 59.0, 22.0]
2025-09-13 20:40:01,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 40 minutes, 29 seconds)
2025-09-13 20:50:43,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:50:43,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:51:35,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 285.00491 ± 207.311
2025-09-13 20:51:35,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [74.07531, 348.73737, 418.76816, 231.16737, 615.26697, 556.2315, 59.426598, 29.745367, 431.9979, 84.63226]
2025-09-13 20:51:35,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [83.0, 176.0, 219.0, 147.0, 324.0, 368.0, 72.0, 34.0, 241.0, 100.0]
2025-09-13 20:51:35,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 36 minutes, 36 seconds)
2025-09-13 21:02:06,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:02:06,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:02:50,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 266.83017 ± 235.074
2025-09-13 21:02:50,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [397.69846, 74.29764, 205.37308, 14.117614, 213.17967, 400.95486, 24.520552, 33.891705, 653.4067, 650.8617]
2025-09-13 21:02:50,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 70.0, 160.0, 20.0, 139.0, 250.0, 39.0, 37.0, 275.0, 277.0]
2025-09-13 21:02:50,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 25 minutes, 5 seconds)
2025-09-13 21:13:58,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:13:58,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:14:28,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 175.70773 ± 179.989
2025-09-13 21:14:28,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [164.26357, 101.563225, 599.8298, 23.643568, 10.554767, 139.01059, 416.08984, 18.980738, 173.42062, 109.72054]
2025-09-13 21:14:28,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 90.0, 239.0, 41.0, 20.0, 106.0, 182.0, 36.0, 113.0, 81.0]
2025-09-13 21:14:28,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 16 minutes, 30 seconds)
2025-09-13 21:24:44,257 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:24:44,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:25:26,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 234.92021 ± 196.209
2025-09-13 21:25:26,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [20.421774, 487.65707, 509.6301, 51.68206, 11.1574955, 329.11075, 230.64267, 5.259596, 235.59483, 468.0459]
2025-09-13 21:25:26,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 361.0, 280.0, 67.0, 19.0, 140.0, 152.0, 16.0, 135.0, 231.0]
2025-09-13 21:25:26,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 49 minutes, 47 seconds)
2025-09-13 21:36:11,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:36:11,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:37:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 312.04205 ± 105.269
2025-09-13 21:37:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [324.8593, 270.74075, 442.8657, 221.07863, 398.58887, 268.47754, 427.83328, 98.79601, 248.8613, 418.31918]
2025-09-13 21:37:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [230.0, 155.0, 201.0, 127.0, 228.0, 171.0, 207.0, 112.0, 201.0, 263.0]
2025-09-13 21:37:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (312.04) for latency ExtremeSparseL4U32
2025-09-13 21:37:07,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 47 minutes, 42 seconds)
2025-09-13 21:47:36,258 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:47:36,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:48:40,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 462.82162 ± 321.093
2025-09-13 21:48:40,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [14.0304165, 278.26337, 617.3265, 511.29892, 286.18903, 416.22156, 1174.553, 240.53413, 833.42194, 256.37738]
2025-09-13 21:48:40,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 218.0, 249.0, 193.0, 136.0, 244.0, 471.0, 162.0, 314.0, 132.0]
2025-09-13 21:48:40,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (462.82) for latency ExtremeSparseL4U32
2025-09-13 21:48:40,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 36 minutes, 7 seconds)
2025-09-13 21:59:35,988 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:59:35,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:00:05,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 151.50259 ± 122.394
2025-09-13 22:00:05,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [210.63115, 11.21928, 242.69727, 127.6698, 228.25684, 58.858883, 8.948248, 8.626891, 389.32645, 228.79114]
2025-09-13 22:00:05,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 25.0, 156.0, 124.0, 181.0, 58.0, 33.0, 20.0, 158.0, 99.0]
2025-09-13 22:00:05,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 27 minutes, 18 seconds)
2025-09-13 22:10:30,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:10:30,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:11:16,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 321.18256 ± 246.879
2025-09-13 22:11:16,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [218.28531, 254.73332, 488.85437, 676.22705, 302.51303, 29.952824, 711.83624, 480.09158, -2.226794, 51.55887]
2025-09-13 22:11:16,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 138.0, 306.0, 259.0, 133.0, 32.0, 283.0, 214.0, 11.0, 58.0]
2025-09-13 22:11:16,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 8 minutes, 47 seconds)
2025-09-13 22:22:12,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:22:12,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:23:00,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 336.39450 ± 199.878
2025-09-13 22:23:00,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.965065, 329.62323, 579.78864, 314.6589, 782.4689, 286.42038, 337.71497, 270.8308, 273.49988, 174.97433]
2025-09-13 22:23:00,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 152.0, 227.0, 151.0, 310.0, 171.0, 147.0, 181.0, 131.0, 141.0]
2025-09-13 22:23:00,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 9 minutes, 25 seconds)
2025-09-13 22:33:36,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:33:36,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:34:21,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 343.45245 ± 309.959
2025-09-13 22:34:21,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [628.94855, 375.67465, 239.02333, 740.5842, 163.39508, 927.45764, 332.99377, 3.4033265, 12.2333975, 10.810473]
2025-09-13 22:34:21,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 169.0, 122.0, 299.0, 106.0, 351.0, 182.0, 14.0, 31.0, 25.0]
2025-09-13 22:34:21,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 14 hours, 52 minutes, 54 seconds)
2025-09-13 22:45:07,191 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:45:07,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:46:00,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 360.06122 ± 351.593
2025-09-13 22:46:00,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [405.6001, 20.896486, 404.82834, 2.4607825, 391.95248, 8.629192, 2.1368334, 1132.7506, 549.24445, 682.11273]
2025-09-13 22:46:00,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 26.0, 175.0, 14.0, 170.0, 21.0, 17.0, 603.0, 276.0, 267.0]
2025-09-13 22:46:00,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 42 minutes, 59 seconds)
2025-09-13 22:56:27,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:56:27,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:57:10,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 298.24927 ± 266.999
2025-09-13 22:57:10,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [335.23758, 825.99194, 599.2303, 527.61127, 123.282936, 7.3986173, 3.9752557, 301.42133, 5.0328465, 253.31044]
2025-09-13 22:57:10,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 357.0, 243.0, 241.0, 113.0, 21.0, 18.0, 153.0, 31.0, 122.0]
2025-09-13 22:57:10,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 14 hours, 27 minutes, 44 seconds)
2025-09-13 23:07:51,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:07:51,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:08:46,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 353.05573 ± 235.940
2025-09-13 23:08:46,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [369.12302, 451.08716, 270.00052, 3.3035948, 819.4805, 371.8855, 147.0027, 685.5635, 163.36295, 249.74776]
2025-09-13 23:08:46,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 203.0, 193.0, 24.0, 475.0, 161.0, 111.0, 277.0, 123.0, 139.0]
2025-09-13 23:08:46,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 22 minutes, 32 seconds)
2025-09-13 23:19:37,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:19:37,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:20:30,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 398.18332 ± 246.883
2025-09-13 23:20:30,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [270.39508, 350.56555, 659.97266, 3.6127813, 7.1969, 477.80563, 337.41336, 622.3577, 783.58527, 468.92813]
2025-09-13 23:20:30,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 164.0, 223.0, 19.0, 19.0, 233.0, 165.0, 269.0, 328.0, 205.0]
2025-09-13 23:20:30,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 11 minutes, 5 seconds)
2025-09-13 23:31:09,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:31:09,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:32:17,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 571.33643 ± 232.812
2025-09-13 23:32:17,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [901.4382, 613.7993, 677.37787, 243.24815, 734.5775, 205.1347, 728.42773, 293.05746, 789.3282, 526.97534]
2025-09-13 23:32:17,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [349.0, 228.0, 240.0, 150.0, 284.0, 113.0, 260.0, 134.0, 277.0, 233.0]
2025-09-13 23:32:17,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (571.34) for latency ExtremeSparseL4U32
2025-09-13 23:32:17,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 5 minutes, 44 seconds)
2025-09-13 23:42:53,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:42:53,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:43:57,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 531.12469 ± 255.922
2025-09-13 23:43:57,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [594.15875, 635.85406, 866.794, 572.28687, 684.82684, 713.72595, 726.22675, 8.980272, 206.21376, 302.17963]
2025-09-13 23:43:57,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 246.0, 308.0, 226.0, 269.0, 266.0, 271.0, 22.0, 154.0, 183.0]
2025-09-13 23:43:57,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 54 minutes, 29 seconds)
2025-09-13 23:54:44,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:54:44,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:55:53,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 532.39471 ± 324.598
2025-09-13 23:55:53,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [726.0401, 547.84875, 4.7734876, 212.7026, 832.97406, 1037.206, 434.3213, 930.6118, 378.55228, 218.91628]
2025-09-13 23:55:53,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [348.0, 217.0, 16.0, 162.0, 323.0, 396.0, 220.0, 343.0, 202.0, 123.0]
2025-09-13 23:55:53,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 53 minutes, 53 seconds)
2025-09-14 00:06:32,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:06:32,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:07:40,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 525.53625 ± 346.142
2025-09-14 00:07:40,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [339.88815, 841.9163, 648.61957, 967.9959, 268.3812, 429.12405, 14.614294, 776.7614, 9.882054, 958.18036]
2025-09-14 00:07:40,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [163.0, 330.0, 369.0, 378.0, 130.0, 187.0, 30.0, 343.0, 24.0, 364.0]
2025-09-14 00:07:40,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 44 minutes, 33 seconds)
2025-09-14 00:18:17,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:18:17,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:19:36,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 654.27533 ± 361.181
2025-09-14 00:19:36,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [475.34537, 988.2572, 433.34186, 860.19495, 816.17737, 923.13916, 913.7968, 33.67672, 51.038113, 1047.7856]
2025-09-14 00:19:36,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [214.0, 358.0, 165.0, 334.0, 284.0, 326.0, 414.0, 44.0, 49.0, 401.0]
2025-09-14 00:19:36,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (654.28) for latency ExtremeSparseL4U32
2025-09-14 00:19:36,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 35 minutes, 29 seconds)
2025-09-14 00:30:20,945 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:30:20,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:31:39,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 681.65076 ± 197.376
2025-09-14 00:31:39,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [669.931, 656.3262, 695.99426, 571.9055, 805.6405, 581.542, 307.80597, 687.14844, 699.15826, 1141.0553]
2025-09-14 00:31:39,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 283.0, 270.0, 241.0, 288.0, 223.0, 128.0, 274.0, 260.0, 421.0]
2025-09-14 00:31:39,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (681.65) for latency ExtremeSparseL4U32
2025-09-14 00:31:39,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 27 minutes, 22 seconds)
2025-09-14 00:42:25,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:42:25,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:43:58,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 789.99915 ± 353.657
2025-09-14 00:43:58,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [372.6584, 1140.0109, 616.9252, 830.34326, 1173.4943, 1151.554, 909.65344, 924.7049, 7.7443395, 772.9029]
2025-09-14 00:43:58,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 432.0, 227.0, 308.0, 460.0, 404.0, 369.0, 362.0, 20.0, 336.0]
2025-09-14 00:43:58,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (790.00) for latency ExtremeSparseL4U32
2025-09-14 00:43:58,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 24 minutes, 9 seconds)
2025-09-14 00:54:39,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:54:39,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:55:45,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 535.72131 ± 406.504
2025-09-14 00:55:45,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [863.1863, 7.170476, 807.02905, 1033.8142, 306.48166, 445.64017, 6.974969, 8.334954, 934.9876, 943.5938]
2025-09-14 00:55:45,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [371.0, 30.0, 316.0, 373.0, 164.0, 212.0, 19.0, 23.0, 356.0, 337.0]
2025-09-14 00:55:45,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 10 minutes, 11 seconds)
2025-09-14 01:06:25,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:06:25,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:07:50,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 734.53973 ± 504.031
2025-09-14 01:07:50,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1031.6576, 1200.1975, 111.697495, 245.46202, 846.10364, 1542.5989, 1065.161, 270.14496, 5.063274, 1027.3108]
2025-09-14 01:07:50,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [347.0, 419.0, 118.0, 136.0, 345.0, 580.0, 424.0, 153.0, 21.0, 354.0]
2025-09-14 01:07:50,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 2 minutes, 8 seconds)
2025-09-14 01:18:40,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:18:40,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:20:13,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 890.86584 ± 164.035
2025-09-14 01:20:13,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [876.37366, 893.7991, 993.5507, 811.70355, 883.55975, 1228.9939, 574.7105, 829.3871, 779.21466, 1037.3656]
2025-09-14 01:20:13,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [308.0, 337.0, 329.0, 330.0, 283.0, 417.0, 224.0, 315.0, 283.0, 356.0]
2025-09-14 01:20:13,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (890.87) for latency ExtremeSparseL4U32
2025-09-14 01:20:13,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 55 minutes, 53 seconds)
2025-09-14 01:30:54,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:30:54,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:32:09,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 618.88934 ± 198.607
2025-09-14 01:32:09,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1022.1023, 759.7251, 728.9444, 454.3428, 358.94684, 804.5621, 382.35077, 526.968, 539.217, 611.7346]
2025-09-14 01:32:09,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [367.0, 295.0, 298.0, 203.0, 193.0, 308.0, 188.0, 222.0, 242.0, 240.0]
2025-09-14 01:32:09,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 42 minutes, 18 seconds)
2025-09-14 01:43:01,018 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:43:01,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:44:06,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 547.47137 ± 293.832
2025-09-14 01:44:06,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.619308, 903.7699, 3.213815, 661.19226, 754.282, 526.82153, 636.4789, 840.70856, 614.10925, 520.5181]
2025-09-14 01:44:06,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 326.0, 15.0, 265.0, 290.0, 244.0, 256.0, 306.0, 252.0, 221.0]
2025-09-14 01:44:06,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 25 minutes, 48 seconds)
2025-09-14 01:54:50,055 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:54:50,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:56:29,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 915.08752 ± 223.304
2025-09-14 01:56:29,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [885.05115, 1166.2037, 574.67944, 752.2437, 783.29755, 1104.1145, 1274.1562, 641.75806, 1098.3988, 870.972]
2025-09-14 01:56:29,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 415.0, 244.0, 311.0, 316.0, 395.0, 429.0, 251.0, 366.0, 343.0]
2025-09-14 01:56:29,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (915.09) for latency ExtremeSparseL4U32
2025-09-14 01:56:29,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 20 minutes, 57 seconds)
2025-09-14 02:06:58,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:06:58,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:08:24,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 791.69177 ± 277.191
2025-09-14 02:08:24,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1157.8755, 927.17615, 810.8859, 105.08694, 1033.1586, 922.5366, 821.5864, 639.10266, 606.5971, 892.91156]
2025-09-14 02:08:24,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [415.0, 342.0, 289.0, 106.0, 360.0, 346.0, 308.0, 246.0, 243.0, 310.0]
2025-09-14 02:08:24,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 6 minutes, 49 seconds)
2025-09-14 02:19:05,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:19:05,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:20:37,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 829.45349 ± 292.364
2025-09-14 02:20:37,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [330.20285, 717.473, 1086.5247, 980.99976, 820.1942, 254.86794, 1140.6844, 1003.738, 955.7628, 1004.0877]
2025-09-14 02:20:37,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 260.0, 409.0, 350.0, 306.0, 140.0, 387.0, 369.0, 347.0, 343.0]
2025-09-14 02:20:37,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 52 minutes, 51 seconds)
2025-09-14 02:31:37,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:31:37,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:33:20,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 968.05457 ± 99.023
2025-09-14 02:33:20,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [956.0747, 969.86774, 1040.7218, 1204.2733, 966.41327, 990.73114, 893.6361, 809.92163, 893.09644, 955.80994]
2025-09-14 02:33:20,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [355.0, 382.0, 352.0, 401.0, 329.0, 350.0, 313.0, 321.0, 330.0, 375.0]
2025-09-14 02:33:20,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (968.05) for latency ExtremeSparseL4U32
2025-09-14 02:33:20,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 49 minutes, 52 seconds)
2025-09-14 02:43:55,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:43:55,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:45:28,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 840.00195 ± 267.976
2025-09-14 02:45:28,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [877.072, 1214.8301, 1247.2244, 828.454, 552.95953, 1098.633, 478.03076, 671.4335, 527.74475, 903.63794]
2025-09-14 02:45:28,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [316.0, 458.0, 429.0, 300.0, 225.0, 357.0, 208.0, 227.0, 257.0, 357.0]
2025-09-14 02:45:28,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 39 minutes, 34 seconds)
2025-09-14 02:56:10,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:56:10,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:58:26,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1322.31702 ± 305.352
2025-09-14 02:58:26,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1485.0, 1438.6627, 2130.7795, 1296.6753, 1225.6893, 958.3954, 1167.657, 1228.0459, 1173.1871, 1119.0781]
2025-09-14 02:58:26,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [488.0, 485.0, 697.0, 444.0, 408.0, 363.0, 423.0, 433.0, 387.0, 373.0]
2025-09-14 02:58:26,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (1322.32) for latency ExtremeSparseL4U32
2025-09-14 02:58:26,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 33 minutes, 47 seconds)
2025-09-14 03:09:02,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:09:02,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:10:14,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 664.64508 ± 504.456
2025-09-14 03:10:14,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [274.01447, 1032.8622, 1334.7351, 819.5939, 8.296831, 10.078769, 15.235619, 1104.0557, 863.28534, 1184.293]
2025-09-14 03:10:14,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 354.0, 446.0, 306.0, 29.0, 27.0, 33.0, 372.0, 316.0, 432.0]
2025-09-14 03:10:14,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 20 minutes, 4 seconds)
2025-09-14 03:20:55,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:20:55,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:22:42,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1033.07922 ± 263.383
2025-09-14 03:22:42,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [877.52344, 1138.2086, 1204.1969, 989.2208, 1113.3223, 1190.9318, 1300.8169, 1090.5474, 1110.519, 315.50473]
2025-09-14 03:22:42,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [318.0, 366.0, 412.0, 354.0, 363.0, 429.0, 439.0, 376.0, 374.0, 213.0]
2025-09-14 03:22:42,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 10 minutes, 26 seconds)
2025-09-14 03:33:33,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:33:33,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:35:14,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 918.41650 ± 145.966
2025-09-14 03:35:14,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [639.09894, 1240.7948, 900.6301, 940.42566, 928.66626, 902.12415, 1017.7044, 813.47406, 965.584, 835.6634]
2025-09-14 03:35:14,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 408.0, 352.0, 340.0, 384.0, 329.0, 349.0, 298.0, 341.0, 311.0]
2025-09-14 03:35:14,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 56 minutes)
2025-09-14 03:46:03,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:46:03,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:47:44,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 963.74207 ± 375.903
2025-09-14 03:47:44,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1228.5702, 857.5211, 1262.9646, 1191.3259, 1317.0337, 1034.8662, 1206.7761, 701.2566, 7.466165, 829.6405]
2025-09-14 03:47:44,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [405.0, 296.0, 411.0, 400.0, 426.0, 370.0, 398.0, 271.0, 35.0, 317.0]
2025-09-14 03:47:44,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 47 minutes, 31 seconds)
2025-09-14 03:58:23,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:58:23,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:00:26,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1158.70972 ± 338.560
2025-09-14 04:00:26,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1040.9286, 949.5111, 1451.8984, 1519.5273, 1053.734, 1450.7041, 753.3822, 607.4341, 1704.0571, 1055.922]
2025-09-14 04:00:26,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [406.0, 339.0, 493.0, 569.0, 368.0, 513.0, 290.0, 243.0, 555.0, 398.0]
2025-09-14 04:00:26,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 32 minutes, 20 seconds)
2025-09-14 04:11:15,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:11:15,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:12:48,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 926.36133 ± 380.020
2025-09-14 04:12:48,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1472.6658, 955.54614, 1121.0624, 1276.739, 1217.9385, 850.2976, 804.1715, 719.92255, 17.449968, 827.81976]
2025-09-14 04:12:48,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [480.0, 349.0, 367.0, 434.0, 395.0, 289.0, 288.0, 256.0, 32.0, 289.0]
2025-09-14 04:12:48,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 25 minutes, 43 seconds)
2025-09-14 04:23:17,129 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:23:17,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:25:05,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1025.47192 ± 206.388
2025-09-14 04:25:05,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1449.8721, 1067.0425, 1171.0748, 724.8869, 985.3664, 992.64417, 1086.9664, 839.45337, 762.48944, 1174.9241]
2025-09-14 04:25:05,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [509.0, 363.0, 420.0, 262.0, 372.0, 334.0, 369.0, 296.0, 302.0, 376.0]
2025-09-14 04:25:05,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 11 minutes, 16 seconds)
2025-09-14 04:36:09,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:36:09,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:37:42,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 891.42206 ± 372.186
2025-09-14 04:37:42,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1253.762, 1161.8013, 520.9084, 1054.2499, 1162.2655, 659.8091, 1058.7693, 1167.3185, 5.1622534, 870.17505]
2025-09-14 04:37:42,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [415.0, 370.0, 225.0, 360.0, 446.0, 224.0, 335.0, 398.0, 18.0, 292.0]
2025-09-14 04:37:42,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 59 minutes, 40 seconds)
2025-09-14 04:48:13,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:48:13,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:49:54,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 946.92755 ± 265.647
2025-09-14 04:49:54,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [997.3996, 380.9966, 1334.624, 794.1999, 1022.85895, 1142.9165, 772.04706, 904.54004, 828.36804, 1291.3237]
2025-09-14 04:49:54,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [350.0, 188.0, 455.0, 262.0, 341.0, 382.0, 286.0, 318.0, 291.0, 452.0]
2025-09-14 04:49:54,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 44 minutes, 20 seconds)
2025-09-14 05:00:34,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:00:34,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:02:23,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1026.21350 ± 185.007
2025-09-14 05:02:23,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [934.9103, 1273.9603, 1245.9075, 1309.3358, 964.4429, 963.5074, 1033.5278, 837.39795, 987.9166, 711.22705]
2025-09-14 05:02:23,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 426.0, 421.0, 422.0, 361.0, 370.0, 374.0, 330.0, 344.0, 279.0]
2025-09-14 05:02:23,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 30 minutes, 4 seconds)
2025-09-14 05:13:08,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:13:08,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:14:38,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 868.27429 ± 455.370
2025-09-14 05:14:38,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1037.6414, 934.7736, 18.543352, 1458.0155, 1157.6277, 1180.8851, 1005.1933, 0.15889274, 995.38983, 894.515]
2025-09-14 05:14:38,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [340.0, 334.0, 33.0, 494.0, 443.0, 379.0, 371.0, 12.0, 337.0, 318.0]
2025-09-14 05:14:38,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 16 minutes, 27 seconds)
2025-09-14 05:25:32,920 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:25:32,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:27:15,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1001.11621 ± 346.440
2025-09-14 05:27:15,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [960.7439, 1176.9923, 1165.6505, 1131.7922, 1108.1735, 947.70984, 9.743266, 1052.7869, 1328.8595, 1128.71]
2025-09-14 05:27:15,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [343.0, 396.0, 391.0, 409.0, 381.0, 372.0, 23.0, 352.0, 437.0, 405.0]
2025-09-14 05:27:15,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 7 minutes, 11 seconds)
2025-09-14 05:37:54,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:37:54,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:39:07,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 654.28290 ± 545.126
2025-09-14 05:39:07,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1162.0989, 658.2308, 70.275955, 1386.7935, 1139.0637, 4.061257, 12.526893, 9.2970085, 914.4551, 1186.0265]
2025-09-14 05:39:07,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [402.0, 285.0, 97.0, 470.0, 375.0, 14.0, 22.0, 32.0, 310.0, 427.0]
2025-09-14 05:39:07,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 48 minutes, 10 seconds)
2025-09-14 05:49:52,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:49:52,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:51:00,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 643.99939 ± 545.546
2025-09-14 05:51:00,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1295.5901, 792.9289, 1186.5214, 4.065083, 953.7364, 840.21344, 1338.824, 17.332579, 0.7920899, 9.990294]
2025-09-14 05:51:00,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [437.0, 285.0, 412.0, 18.0, 314.0, 315.0, 470.0, 30.0, 14.0, 26.0]
2025-09-14 05:51:00,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 33 minutes, 19 seconds)
2025-09-14 06:01:30,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:01:30,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:03:30,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1113.95435 ± 422.381
2025-09-14 06:03:30,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [866.55286, 337.76764, 1880.0171, 653.8968, 1394.3501, 1560.8694, 1067.0236, 999.72217, 1098.0006, 1281.3431]
2025-09-14 06:03:30,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [329.0, 173.0, 639.0, 277.0, 449.0, 510.0, 375.0, 406.0, 389.0, 453.0]
2025-09-14 06:03:30,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 21 minutes, 2 seconds)
2025-09-14 06:14:14,074 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:14:14,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:16:16,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1178.12451 ± 308.508
2025-09-14 06:16:16,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [884.2037, 784.9968, 1062.8273, 1451.1917, 890.0813, 1463.0663, 1371.2386, 794.92535, 1622.1302, 1456.5841]
2025-09-14 06:16:16,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [347.0, 275.0, 363.0, 499.0, 322.0, 497.0, 435.0, 282.0, 542.0, 480.0]
2025-09-14 06:16:16,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 13 minutes, 1 second)
2025-09-14 06:27:16,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:27:16,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:29:08,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1087.84253 ± 276.032
2025-09-14 06:29:08,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1425.2603, 1453.6804, 816.90106, 1009.45917, 812.188, 1515.428, 964.89996, 1229.6038, 864.40027, 786.60596]
2025-09-14 06:29:08,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [465.0, 485.0, 296.0, 333.0, 293.0, 506.0, 328.0, 418.0, 305.0, 309.0]
2025-09-14 06:29:08,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 2 minutes, 38 seconds)
2025-09-14 06:39:42,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:39:42,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:41:18,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 916.99866 ± 401.527
2025-09-14 06:41:18,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1225.0128, 713.5051, 741.7918, 0.9212637, 661.4202, 1087.2709, 1545.6578, 1240.9913, 1023.7767, 929.6388]
2025-09-14 06:41:18,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [424.0, 262.0, 313.0, 23.0, 275.0, 388.0, 499.0, 428.0, 343.0, 321.0]
2025-09-14 06:41:18,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 52 minutes, 35 seconds)
2025-09-14 06:52:17,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:52:17,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:54:13,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1176.26355 ± 353.086
2025-09-14 06:54:13,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [813.52985, 1069.3679, 906.7734, 1928.3154, 1480.7487, 1005.9149, 1639.4586, 997.7646, 1005.853, 914.90924]
2025-09-14 06:54:13,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [272.0, 389.0, 298.0, 611.0, 459.0, 360.0, 531.0, 343.0, 339.0, 325.0]
2025-09-14 06:54:13,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 47 minutes, 43 seconds)
2025-09-14 07:04:56,864 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:04:56,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:06:43,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1003.66370 ± 92.754
2025-09-14 07:06:43,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1065.1068, 1018.7037, 793.6598, 1028.0416, 889.00635, 1069.1357, 1004.18774, 1023.36316, 1002.08215, 1143.3497]
2025-09-14 07:06:43,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [352.0, 360.0, 321.0, 383.0, 299.0, 391.0, 331.0, 325.0, 368.0, 396.0]
2025-09-14 07:06:43,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 35 minutes, 10 seconds)
2025-09-14 07:17:26,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:17:26,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:19:07,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 940.68860 ± 277.825
2025-09-14 07:19:07,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [706.2094, 777.31433, 903.009, 752.1791, 1177.8654, 1458.5111, 430.45786, 1184.1877, 1008.08716, 1009.0657]
2025-09-14 07:19:07,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [276.0, 298.0, 333.0, 285.0, 399.0, 495.0, 194.0, 410.0, 325.0, 353.0]
2025-09-14 07:19:07,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 19 minutes, 59 seconds)
2025-09-14 07:29:39,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:29:39,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:31:24,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1035.52417 ± 144.969
2025-09-14 07:31:24,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1038.4005, 972.49146, 915.876, 1047.2562, 1314.4069, 879.7924, 1247.3145, 1109.0648, 987.90106, 842.7386]
2025-09-14 07:31:24,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [360.0, 319.0, 320.0, 350.0, 453.0, 346.0, 426.0, 378.0, 310.0, 299.0]
2025-09-14 07:31:24,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 3 minutes, 26 seconds)
2025-09-14 07:42:21,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:42:21,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:44:08,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1035.41223 ± 224.019
2025-09-14 07:44:08,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [725.0819, 1161.2925, 945.42053, 1041.3273, 881.22375, 1135.6914, 1558.969, 1110.26, 775.0823, 1019.77484]
2025-09-14 07:44:08,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 414.0, 349.0, 381.0, 296.0, 416.0, 509.0, 355.0, 265.0, 366.0]
2025-09-14 07:44:08,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 54 minutes, 42 seconds)
2025-09-14 07:54:35,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:54:35,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:56:13,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 911.31262 ± 529.357
2025-09-14 07:56:13,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1061.5736, 1689.7225, 716.9913, 1005.1276, 1498.4901, 841.8347, 5.4486604, 8.866567, 1275.9417, 1009.12946]
2025-09-14 07:56:13,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [371.0, 557.0, 273.0, 361.0, 509.0, 338.0, 27.0, 22.0, 429.0, 364.0]
2025-09-14 07:56:13,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 36 minutes, 48 seconds)
2025-09-14 08:07:14,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:07:14,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:09:12,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1098.67957 ± 215.643
2025-09-14 08:09:12,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1348.1578, 965.1607, 1004.935, 1131.6151, 1002.69434, 1610.0734, 971.61725, 1045.033, 1100.9174, 806.5917]
2025-09-14 08:09:12,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [464.0, 381.0, 355.0, 389.0, 344.0, 543.0, 339.0, 376.0, 402.0, 281.0]
2025-09-14 08:09:12,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 27 minutes, 23 seconds)
2025-09-14 08:19:45,152 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:19:45,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:21:31,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1006.25226 ± 582.364
2025-09-14 08:21:31,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2254.73, 1146.8093, 1190.7413, 283.5336, 1393.7648, 732.87976, 987.188, 897.5702, 1166.0739, 9.231782]
2025-09-14 08:21:31,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [791.0, 381.0, 408.0, 131.0, 514.0, 280.0, 328.0, 316.0, 376.0, 23.0]
2025-09-14 08:21:31,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 14 minutes, 24 seconds)
2025-09-14 08:32:29,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:32:29,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:34:08,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 956.41150 ± 274.803
2025-09-14 08:34:08,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1044.8522, 1370.4418, 1140.3503, 843.5167, 816.0252, 1243.7188, 919.368, 395.11884, 1122.5375, 668.187]
2025-09-14 08:34:08,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [432.0, 460.0, 379.0, 293.0, 312.0, 389.0, 313.0, 170.0, 397.0, 245.0]
2025-09-14 08:34:08,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 3 minutes, 51 seconds)
2025-09-14 08:44:51,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:44:51,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:45:53,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 558.66003 ± 533.083
2025-09-14 08:45:53,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [261.04672, 1370.0543, 892.98676, 879.4737, 867.2974, 1301.6952, 5.2645006, 2.6189437, 6.1324363, 0.030057333]
2025-09-14 08:45:53,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 489.0, 334.0, 301.0, 320.0, 430.0, 18.0, 17.0, 21.0, 20.0]
2025-09-14 08:45:53,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 45 minutes, 48 seconds)
2025-09-14 08:56:40,954 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:56:40,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:58:32,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1032.80627 ± 267.577
2025-09-14 08:58:32,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1309.1904, 1655.7917, 937.32294, 632.74603, 1015.60675, 860.3708, 889.8205, 963.04315, 1148.2073, 915.9627]
2025-09-14 08:58:32,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [462.0, 574.0, 331.0, 267.0, 377.0, 312.0, 300.0, 344.0, 384.0, 327.0]
2025-09-14 08:58:32,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 36 minutes, 29 seconds)
2025-09-14 09:09:15,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:09:15,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:10:59,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1036.27380 ± 293.527
2025-09-14 09:10:59,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [901.371, 1182.0928, 1125.0264, 801.12585, 667.5735, 799.75085, 1359.4529, 1682.3721, 824.3804, 1019.5915]
2025-09-14 09:10:59,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [305.0, 375.0, 415.0, 286.0, 232.0, 322.0, 470.0, 565.0, 279.0, 335.0]
2025-09-14 09:10:59,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 21 minutes, 19 seconds)
2025-09-14 09:21:54,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:21:54,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:23:22,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 831.37433 ± 403.711
2025-09-14 09:23:22,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1135.4641, 558.336, 793.78394, 1101.7965, 1356.3204, 1009.4924, 1216.2883, 7.7165527, 806.9594, 327.58508]
2025-09-14 09:23:22,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [388.0, 226.0, 294.0, 376.0, 472.0, 326.0, 408.0, 21.0, 311.0, 160.0]
2025-09-14 09:23:22,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 9 minutes, 12 seconds)
2025-09-14 09:33:44,485 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:33:44,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:35:34,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1053.96606 ± 481.344
2025-09-14 09:35:34,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [871.2433, 1037.1749, 1379.3315, 3.5541224, 1068.2184, 553.6595, 1052.2734, 1590.3938, 1767.7405, 1216.0718]
2025-09-14 09:35:34,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [307.0, 356.0, 482.0, 20.0, 380.0, 220.0, 374.0, 519.0, 575.0, 443.0]
2025-09-14 09:35:34,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 54 minutes, 50 seconds)
2025-09-14 09:46:16,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:46:16,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:47:59,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1017.29199 ± 544.713
2025-09-14 09:47:59,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1005.15466, 1256.0636, 1176.72, 962.3442, 12.0364685, 1531.8518, 2.3328042, 1133.6538, 1582.8358, 1509.9271]
2025-09-14 09:47:59,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [350.0, 433.0, 380.0, 321.0, 33.0, 511.0, 18.0, 378.0, 543.0, 479.0]
2025-09-14 09:47:59,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 45 minutes, 42 seconds)
2025-09-14 09:58:37,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:58:37,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:00:35,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1181.77502 ± 357.028
2025-09-14 10:00:35,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1278.1003, 1116.0796, 1205.7969, 1263.7207, 960.84235, 394.23648, 923.95435, 1774.4331, 1557.1937, 1343.3921]
2025-09-14 10:00:35,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [414.0, 382.0, 416.0, 401.0, 299.0, 248.0, 308.0, 591.0, 545.0, 447.0]
2025-09-14 10:00:35,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 33 minutes)
2025-09-14 10:11:34,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:11:34,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:13:23,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 989.96289 ± 223.940
2025-09-14 10:13:23,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [808.2204, 1178.6218, 1020.1217, 1348.0105, 994.16455, 941.54407, 806.8835, 812.50134, 651.0216, 1338.5386]
2025-09-14 10:13:23,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [288.0, 410.0, 326.0, 463.0, 390.0, 314.0, 311.0, 298.0, 477.0, 461.0]
2025-09-14 10:13:23,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 22 minutes, 3 seconds)
2025-09-14 10:23:39,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:23:39,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:25:25,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1025.35925 ± 177.610
2025-09-14 10:25:25,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [970.4266, 1038.1622, 745.1105, 917.1032, 1378.9623, 1254.8639, 841.54114, 1052.7286, 1097.2074, 957.4872]
2025-09-14 10:25:25,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 353.0, 270.0, 309.0, 456.0, 453.0, 302.0, 355.0, 378.0, 326.0]
2025-09-14 10:25:25,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 8 minutes, 13 seconds)
2025-09-14 10:36:24,986 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:36:24,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:38:15,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1058.25183 ± 308.043
2025-09-14 10:38:15,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [611.6906, 1124.5715, 958.74603, 1105.0741, 693.627, 1437.1417, 1219.2146, 974.6401, 1656.3054, 801.507]
2025-09-14 10:38:15,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 362.0, 319.0, 364.0, 275.0, 501.0, 409.0, 344.0, 584.0, 285.0]
2025-09-14 10:38:15,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 58 minutes, 12 seconds)
2025-09-14 10:48:35,708 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:48:35,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:50:11,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 966.06299 ± 381.605
2025-09-14 10:50:11,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1452.0802, 3.4014401, 935.2468, 929.3941, 1194.1387, 1099.8389, 1018.57117, 1284.9221, 1093.7551, 649.28107]
2025-09-14 10:50:11,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [506.0, 13.0, 308.0, 329.0, 390.0, 378.0, 336.0, 427.0, 358.0, 242.0]
2025-09-14 10:50:11,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 43 minutes, 53 seconds)
2025-09-14 11:01:00,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:01:00,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:02:36,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 972.62891 ± 118.543
2025-09-14 11:02:36,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [918.3478, 1074.3312, 968.6969, 773.2571, 880.5823, 1028.3569, 1012.37915, 1234.0516, 914.181, 922.10504]
2025-09-14 11:02:36,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [315.0, 349.0, 325.0, 288.0, 290.0, 343.0, 360.0, 409.0, 324.0, 302.0]
2025-09-14 11:02:36,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 30 minutes, 53 seconds)
2025-09-14 11:13:03,657 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:13:03,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:14:54,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1124.50964 ± 147.826
2025-09-14 11:14:54,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1333.2847, 1276.7754, 1375.1129, 1083.6417, 980.41895, 1165.8934, 1026.6375, 1054.6779, 923.13715, 1025.5176]
2025-09-14 11:14:54,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [422.0, 413.0, 437.0, 343.0, 353.0, 391.0, 341.0, 369.0, 312.0, 334.0]
2025-09-14 11:14:54,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 16 minutes, 50 seconds)
2025-09-14 11:26:03,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:26:03,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:27:24,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 776.00354 ± 543.554
2025-09-14 11:27:24,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1458.3735, 1257.0546, 702.49097, 7.5168743, 10.702453, 2.952162, 1228.8014, 1259.4504, 910.81354, 921.8794]
2025-09-14 11:27:24,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [581.0, 430.0, 258.0, 21.0, 21.0, 13.0, 409.0, 425.0, 299.0, 315.0]
2025-09-14 11:27:24,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 5 minutes, 56 seconds)
2025-09-14 11:37:47,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:37:47,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:39:22,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 924.02655 ± 413.699
2025-09-14 11:39:22,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1330.2617, 1192.6288, 897.90875, 1505.7322, 1127.8177, 1021.4064, 850.1614, 6.639782, 440.34128, 867.3681]
2025-09-14 11:39:22,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [418.0, 404.0, 312.0, 495.0, 377.0, 377.0, 317.0, 18.0, 174.0, 291.0]
2025-09-14 11:39:22,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 51 minutes, 7 seconds)
2025-09-14 11:49:59,060 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:49:59,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:51:55,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1136.16187 ± 242.821
2025-09-14 11:51:55,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1111.6617, 790.9006, 1237.0101, 1109.5999, 1757.57, 1134.386, 973.6436, 1160.6119, 923.8243, 1162.411]
2025-09-14 11:51:55,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [374.0, 275.0, 425.0, 430.0, 630.0, 410.0, 348.0, 386.0, 313.0, 399.0]
2025-09-14 11:51:55,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 40 minutes, 29 seconds)
2025-09-14 12:02:34,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:02:34,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:04:39,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1256.52954 ± 153.539
2025-09-14 12:04:39,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1048.0183, 1450.4415, 1338.6473, 1083.1481, 1379.7153, 1452.4504, 1268.9342, 1170.3114, 1340.1003, 1033.5297]
2025-09-14 12:04:39,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [355.0, 491.0, 418.0, 356.0, 449.0, 498.0, 437.0, 388.0, 434.0, 348.0]
2025-09-14 12:04:39,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 28 minutes, 55 seconds)
2025-09-14 12:15:40,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:15:40,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:17:15,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 946.25769 ± 413.563
2025-09-14 12:17:15,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.705185, 574.5699, 947.9947, 1370.0933, 818.54114, 1516.053, 896.098, 885.21173, 1199.0536, 1251.2555]
2025-09-14 12:17:15,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 231.0, 318.0, 430.0, 296.0, 518.0, 320.0, 290.0, 397.0, 420.0]
2025-09-14 12:17:15,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 17 minutes, 10 seconds)
2025-09-14 12:27:36,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:27:36,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:29:31,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1204.55652 ± 250.220
2025-09-14 12:29:31,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1087.2073, 1499.1715, 942.82605, 1181.4354, 1370.3843, 731.714, 1329.193, 1523.0226, 1400.1309, 980.48035]
2025-09-14 12:29:31,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [358.0, 466.0, 310.0, 402.0, 480.0, 226.0, 437.0, 487.0, 452.0, 327.0]
2025-09-14 12:29:31,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 4 minutes, 14 seconds)
2025-09-14 12:39:57,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:39:57,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:41:29,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 921.81311 ± 486.656
2025-09-14 12:41:29,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1362.1534, 5.52116, 1086.2867, 859.2287, 837.80096, 1216.0961, 1206.299, 17.141987, 1355.9508, 1271.6528]
2025-09-14 12:41:29,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [462.0, 17.0, 355.0, 316.0, 296.0, 383.0, 384.0, 32.0, 433.0, 416.0]
2025-09-14 12:41:29,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 51 minutes, 49 seconds)
2025-09-14 12:52:07,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:52:07,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:53:42,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 963.78241 ± 256.889
2025-09-14 12:53:42,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [964.36365, 1011.5552, 979.83105, 1045.2517, 830.02637, 1324.8685, 367.98517, 815.0161, 1317.4067, 981.51984]
2025-09-14 12:53:42,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [336.0, 336.0, 350.0, 361.0, 278.0, 397.0, 146.0, 285.0, 421.0, 343.0]
2025-09-14 12:53:42,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 38 minutes, 51 seconds)
2025-09-14 13:04:52,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:04:52,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:07:02,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1309.32422 ± 272.335
2025-09-14 13:07:02,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1483.8184, 1028.788, 1731.7684, 1541.2937, 1425.5331, 1231.795, 1276.5037, 1538.2017, 816.18097, 1019.36084]
2025-09-14 13:07:02,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [490.0, 366.0, 561.0, 498.0, 487.0, 419.0, 468.0, 491.0, 291.0, 359.0]
2025-09-14 13:07:02,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 27 minutes, 19 seconds)
2025-09-14 13:17:47,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:17:47,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:19:28,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 974.45410 ± 203.217
2025-09-14 13:19:28,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1318.4287, 1188.0115, 639.28314, 1010.58325, 1070.1444, 630.5682, 993.4545, 1027.9867, 907.26733, 958.8132]
2025-09-14 13:19:28,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [446.0, 467.0, 229.0, 349.0, 347.0, 239.0, 353.0, 327.0, 306.0, 327.0]
2025-09-14 13:19:28,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 14 minutes, 40 seconds)
2025-09-14 13:29:45,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:29:45,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:31:22,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 983.29034 ± 215.093
2025-09-14 13:31:22,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [566.69275, 1279.4254, 863.91644, 1118.5535, 1141.3849, 1139.02, 956.7756, 1142.1084, 945.4667, 679.5598]
2025-09-14 13:31:22,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [209.0, 407.0, 301.0, 391.0, 397.0, 356.0, 327.0, 365.0, 322.0, 243.0]
2025-09-14 13:31:22,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 1 minute, 50 seconds)
2025-09-14 13:41:44,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:41:44,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:43:41,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1158.52112 ± 257.934
2025-09-14 13:43:41,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [782.6782, 777.2772, 1250.248, 1305.8761, 976.77844, 1091.3262, 1683.3253, 1279.0155, 1136.6271, 1302.0591]
2025-09-14 13:43:41,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 285.0, 402.0, 430.0, 364.0, 398.0, 512.0, 429.0, 387.0, 416.0]
2025-09-14 13:43:41,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 49 minutes, 45 seconds)
2025-09-14 13:54:18,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:54:18,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:56:04,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1097.95520 ± 171.594
2025-09-14 13:56:04,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1275.0605, 865.8039, 1052.3715, 1118.7946, 1384.0039, 950.24524, 1346.3512, 919.809, 1008.0744, 1059.0378]
2025-09-14 13:56:04,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [423.0, 301.0, 328.0, 386.0, 464.0, 321.0, 422.0, 302.0, 336.0, 350.0]
2025-09-14 13:56:04,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 37 minutes, 25 seconds)
2025-09-14 14:07:01,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:07:01,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:08:55,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1130.04114 ± 225.390
2025-09-14 14:08:55,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1464.4335, 1048.6901, 1348.4282, 611.25055, 975.4274, 1172.8376, 1310.1729, 1089.8627, 1059.8167, 1219.4913]
2025-09-14 14:08:55,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [479.0, 380.0, 448.0, 236.0, 342.0, 371.0, 429.0, 362.0, 352.0, 439.0]
2025-09-14 14:08:55,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 45 seconds)
2025-09-14 14:19:23,025 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:19:23,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:20:51,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 888.93323 ± 462.798
2025-09-14 14:20:51,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1039.5171, 11.817093, 6.1282353, 964.0949, 1176.7949, 1084.6243, 1247.1901, 1395.4327, 1125.9065, 837.82605]
2025-09-14 14:20:51,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [351.0, 22.0, 18.0, 362.0, 377.0, 379.0, 404.0, 466.0, 386.0, 275.0]
2025-09-14 14:20:51,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 16 seconds)
2025-09-14 14:31:30,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:31:30,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:33:27,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1126.16980 ± 321.483
2025-09-14 14:33:27,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1489.1068, 1181.97, 494.35757, 1001.34814, 1293.7567, 1251.1772, 816.4439, 1595.1337, 1311.9471, 826.45703]
2025-09-14 14:33:27,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [550.0, 404.0, 196.0, 329.0, 452.0, 422.0, 299.0, 578.0, 452.0, 303.0]
2025-09-14 14:33:27,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1251 [DEBUG]: Training session finished
