2025-09-13 18:23:04,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:23:04,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:23:04,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x145d25a91550>}
2025-09-13 18:23:04,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1111 [DEBUG]: using device: cuda
2025-09-13 18:23:04,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1133 [INFO]: Creating new trainer
2025-09-13 18:23:04,617 baseline-mbpac-noiseperc5-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-13 18:23:04,617 baseline-mbpac-noiseperc5-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 18:23:04,625 baseline-mbpac-noiseperc5-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 18:23:06,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1194 [DEBUG]: Starting training session...
2025-09-13 18:23:06,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 1/100
2025-09-13 18:34:02,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:34:02,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:34:52,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 249.60307 ± 100.995
2025-09-13 18:34:52,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [85.035934, 292.87283, 298.62698, 296.68732, 322.11185, 293.7466, 315.9556, 278.48978, 17.41414, 295.0899]
2025-09-13 18:34:52,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 167.0, 180.0, 182.0, 203.0, 184.0, 187.0, 165.0, 28.0, 177.0]
2025-09-13 18:34:52,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (249.60) for latency ExtremeSparseL4U32
2025-09-13 18:34:52,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 24 minutes, 45 seconds)
2025-09-13 18:45:32,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:45:32,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:47:03,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 263.44846 ± 183.351
2025-09-13 18:47:03,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [186.83794, 310.91684, 473.84973, 185.23497, 95.048584, 716.1357, 184.55444, 150.17645, 109.81132, 221.91867]
2025-09-13 18:47:03,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 274.0, 633.0, 104.0, 92.0, 1000.0, 278.0, 127.0, 89.0, 345.0]
2025-09-13 18:47:03,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (263.45) for latency ExtremeSparseL4U32
2025-09-13 18:47:03,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 33 minutes, 32 seconds)
2025-09-13 18:57:47,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:57:47,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:58:51,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 311.54266 ± 118.095
2025-09-13 18:58:51,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [453.29446, 329.34912, 471.71426, 369.98035, 348.33093, 300.77325, 43.808563, 320.59424, 294.31458, 183.26688]
2025-09-13 18:58:51,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 198.0, 436.0, 223.0, 171.0, 211.0, 160.0, 186.0, 181.0, 109.0]
2025-09-13 18:58:51,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (311.54) for latency ExtremeSparseL4U32
2025-09-13 18:58:51,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 15 minutes, 48 seconds)
2025-09-13 19:09:37,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:09:37,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:10:51,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 324.70361 ± 212.211
2025-09-13 19:10:51,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [24.366722, 266.90753, 680.8012, 559.5449, -29.894382, 323.94296, 255.52887, 248.37044, 494.48355, 422.98413]
2025-09-13 19:10:51,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 165.0, 474.0, 437.0, 171.0, 200.0, 142.0, 169.0, 325.0, 300.0]
2025-09-13 19:10:51,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (324.70) for latency ExtremeSparseL4U32
2025-09-13 19:10:51,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 5 minutes, 43 seconds)
2025-09-13 19:21:26,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:21:26,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:22:37,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 384.06738 ± 109.276
2025-09-13 19:22:37,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [384.6816, 487.6116, 403.07443, 392.9526, 444.94818, 70.40786, 379.65527, 405.814, 448.50098, 423.02725]
2025-09-13 19:22:37,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 248.0, 291.0, 203.0, 244.0, 133.0, 233.0, 270.0, 232.0, 268.0]
2025-09-13 19:22:37,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (384.07) for latency ExtremeSparseL4U32
2025-09-13 19:22:37,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 50 minutes, 47 seconds)
2025-09-13 19:33:18,743 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:33:18,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:34:23,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 271.53442 ± 130.100
2025-09-13 19:34:23,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [331.4514, 443.8897, 277.54446, 272.1081, 133.88275, 222.87032, 480.64554, 59.90527, 135.06375, 357.9832]
2025-09-13 19:34:23,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 302.0, 204.0, 177.0, 147.0, 291.0, 329.0, 71.0, 156.0, 243.0]
2025-09-13 19:34:23,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 38 minutes, 58 seconds)
2025-09-13 19:45:29,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:45:29,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:46:24,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 260.75653 ± 129.999
2025-09-13 19:46:24,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [333.25424, 401.63815, 285.6472, 282.48184, 288.69534, 221.46886, 24.650331, 369.11288, 19.087484, 381.5292]
2025-09-13 19:46:24,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [187.0, 262.0, 168.0, 389.0, 174.0, 117.0, 56.0, 240.0, 33.0, 186.0]
2025-09-13 19:46:24,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 23 minutes, 43 seconds)
2025-09-13 19:56:40,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:56:40,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:57:50,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 249.12265 ± 172.083
2025-09-13 19:57:50,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3.5074656, 490.9321, 186.67308, 352.4306, 521.6438, 65.62452, 218.46335, 146.59428, 105.47704, 399.88034]
2025-09-13 19:57:50,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 301.0, 279.0, 487.0, 383.0, 79.0, 156.0, 174.0, 198.0, 229.0]
2025-09-13 19:57:50,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 5 minutes, 14 seconds)
2025-09-13 20:08:32,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:08:32,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:10:06,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 484.12592 ± 183.061
2025-09-13 20:10:06,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [289.21304, 562.26404, 410.33038, 469.3691, 462.85196, 346.81793, 488.10028, 381.07806, 443.37177, 987.86255]
2025-09-13 20:10:06,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 374.0, 214.0, 278.0, 301.0, 207.0, 282.0, 240.0, 297.0, 736.0]
2025-09-13 20:10:06,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (484.13) for latency ExtremeSparseL4U32
2025-09-13 20:10:06,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 58 minutes, 24 seconds)
2025-09-13 20:20:54,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:20:54,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:21:59,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 377.55865 ± 201.061
2025-09-13 20:21:59,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [33.299084, 351.38144, 103.44683, 406.7848, 301.93292, 366.49133, 525.2715, 434.90875, 460.76022, 791.30963]
2025-09-13 20:21:59,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [41.0, 190.0, 165.0, 200.0, 184.0, 216.0, 267.0, 248.0, 259.0, 409.0]
2025-09-13 20:21:59,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 48 minutes, 27 seconds)
2025-09-13 20:32:38,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:32:38,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:33:21,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 220.20596 ± 104.792
2025-09-13 20:33:21,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [173.10844, 248.27147, 305.2732, 3.8089406, 279.0767, 257.75208, 197.4934, 214.85164, 409.47067, 112.95308]
2025-09-13 20:33:21,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 189.0, 157.0, 14.0, 162.0, 149.0, 112.0, 114.0, 321.0, 104.0]
2025-09-13 20:33:21,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 29 minutes, 35 seconds)
2025-09-13 20:44:00,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:44:00,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:45:05,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 364.46088 ± 86.658
2025-09-13 20:45:05,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [329.6495, 410.19897, 324.55408, 344.17764, 326.14285, 225.86057, 417.92377, 472.7386, 269.30087, 524.06177]
2025-09-13 20:45:05,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 285.0, 173.0, 201.0, 192.0, 135.0, 214.0, 233.0, 150.0, 356.0]
2025-09-13 20:45:05,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 12 minutes, 50 seconds)
2025-09-13 20:56:01,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:56:01,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:57:04,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 366.63440 ± 150.638
2025-09-13 20:57:04,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [367.87503, 130.169, 200.82986, 662.563, 334.72955, 223.73373, 390.38312, 498.16907, 362.36002, 495.5318]
2025-09-13 20:57:04,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 75.0, 109.0, 334.0, 193.0, 129.0, 203.0, 335.0, 216.0, 295.0]
2025-09-13 20:57:04,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 10 minutes, 38 seconds)
2025-09-13 21:07:38,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:07:38,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:08:26,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 278.71790 ± 229.314
2025-09-13 21:08:26,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3.8077636, 370.24084, 269.43726, 502.34036, 776.68, 97.83019, 133.94344, 392.2134, 3.9111025, 236.7747]
2025-09-13 21:08:26,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 215.0, 150.0, 320.0, 309.0, 96.0, 160.0, 220.0, 13.0, 133.0]
2025-09-13 21:08:26,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 43 minutes, 22 seconds)
2025-09-13 21:19:16,521 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:19:16,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:20:18,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 473.60962 ± 160.273
2025-09-13 21:20:18,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [269.26337, 636.0197, 611.62384, 201.17413, 689.3089, 430.6644, 324.49457, 629.5921, 488.25934, 455.69598]
2025-09-13 21:20:18,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 242.0, 247.0, 113.0, 298.0, 189.0, 172.0, 243.0, 210.0, 224.0]
2025-09-13 21:20:18,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 31 minutes, 25 seconds)
2025-09-13 21:31:10,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:31:10,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:32:16,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 504.72003 ± 122.031
2025-09-13 21:32:16,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [379.78717, 651.2252, 625.3003, 372.94647, 743.70074, 465.6258, 387.80615, 475.26108, 424.18286, 521.3642]
2025-09-13 21:32:16,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 257.0, 257.0, 169.0, 294.0, 181.0, 167.0, 223.0, 185.0, 289.0]
2025-09-13 21:32:16,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (504.72) for latency ExtremeSparseL4U32
2025-09-13 21:32:16,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 29 minutes, 38 seconds)
2025-09-13 21:43:00,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:43:00,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:44:40,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 707.88440 ± 304.144
2025-09-13 21:44:40,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [988.5213, 128.15079, 791.20984, 857.9465, 487.98392, 410.50632, 691.24725, 1259.7493, 869.53754, 593.99133]
2025-09-13 21:44:40,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [430.0, 180.0, 344.0, 348.0, 216.0, 198.0, 330.0, 689.0, 351.0, 272.0]
2025-09-13 21:44:40,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (707.88) for latency ExtremeSparseL4U32
2025-09-13 21:44:40,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 29 minutes, 15 seconds)
2025-09-13 21:55:07,945 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:55:07,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:56:40,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 669.21991 ± 129.304
2025-09-13 21:56:40,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [520.5002, 573.382, 647.3875, 668.0653, 561.25, 766.3617, 780.90533, 564.1384, 969.2325, 640.97644]
2025-09-13 21:56:40,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [230.0, 223.0, 294.0, 322.0, 263.0, 324.0, 347.0, 230.0, 437.0, 409.0]
2025-09-13 21:56:40,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 17 minutes, 34 seconds)
2025-09-13 22:07:26,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:07:26,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:08:46,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 660.82080 ± 175.311
2025-09-13 22:08:46,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [608.5629, 443.37747, 583.3553, 758.6133, 500.60443, 643.8989, 1114.2461, 606.3836, 615.97766, 733.18823]
2025-09-13 22:08:46,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 228.0, 239.0, 274.0, 201.0, 268.0, 428.0, 234.0, 241.0, 281.0]
2025-09-13 22:08:46,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 17 minutes, 20 seconds)
2025-09-13 22:19:35,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:19:35,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:20:55,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 665.32703 ± 121.545
2025-09-13 22:20:55,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [640.0791, 715.98724, 734.6259, 465.412, 482.9907, 796.2543, 837.4123, 706.27954, 727.91376, 546.3158]
2025-09-13 22:20:55,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [267.0, 263.0, 282.0, 208.0, 192.0, 361.0, 293.0, 311.0, 296.0, 219.0]
2025-09-13 22:20:55,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 9 minutes, 53 seconds)
2025-09-13 22:31:31,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:31:31,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:33:01,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 764.65643 ± 72.214
2025-09-13 22:33:01,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [692.92017, 850.82153, 631.736, 737.30023, 796.9312, 822.84863, 763.6055, 796.7012, 864.6232, 689.0772]
2025-09-13 22:33:01,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 324.0, 235.0, 271.0, 301.0, 299.0, 320.0, 374.0, 324.0, 255.0]
2025-09-13 22:33:01,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (764.66) for latency ExtremeSparseL4U32
2025-09-13 22:33:01,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 59 minutes, 58 seconds)
2025-09-13 22:43:55,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:43:55,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:45:41,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 911.36768 ± 201.194
2025-09-13 22:45:41,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [782.9567, 917.52057, 822.02075, 1207.8254, 864.898, 817.7821, 875.3054, 751.92114, 1370.8809, 702.56555]
2025-09-13 22:45:41,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [298.0, 326.0, 308.0, 462.0, 321.0, 328.0, 331.0, 318.0, 500.0, 305.0]
2025-09-13 22:45:41,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (911.37) for latency ExtremeSparseL4U32
2025-09-13 22:45:41,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 51 minutes, 41 seconds)
2025-09-13 22:56:31,093 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:56:31,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:57:58,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 742.07794 ± 169.038
2025-09-13 22:57:58,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [488.29202, 974.66034, 978.2005, 680.18506, 860.5816, 809.3022, 644.5774, 843.46356, 497.93973, 643.5764]
2025-09-13 22:57:58,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 360.0, 357.0, 346.0, 338.0, 288.0, 249.0, 304.0, 188.0, 282.0]
2025-09-13 22:57:58,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 43 minutes, 51 seconds)
2025-09-13 23:08:32,225 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:08:32,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:10:10,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 703.54004 ± 368.793
2025-09-13 23:10:10,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [834.75824, 137.80093, 1236.0027, 720.4544, 959.8786, 795.05676, 356.75427, 1162.443, 125.56449, 706.6875]
2025-09-13 23:10:10,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 210.0, 546.0, 338.0, 351.0, 315.0, 160.0, 451.0, 173.0, 393.0]
2025-09-13 23:10:10,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 33 minutes, 22 seconds)
2025-09-13 23:21:02,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:21:02,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:22:34,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 731.37665 ± 334.475
2025-09-13 23:22:34,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [706.32324, 938.2059, 873.74805, 994.9873, 767.7892, 113.54887, 1029.5682, 856.36993, 66.1983, 967.0272]
2025-09-13 23:22:34,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [328.0, 362.0, 317.0, 441.0, 311.0, 142.0, 393.0, 341.0, 84.0, 362.0]
2025-09-13 23:22:34,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 24 minutes, 48 seconds)
2025-09-13 23:33:04,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:33:04,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:35:14,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1084.77283 ± 404.726
2025-09-13 23:35:14,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [786.07904, 1013.2095, 893.97736, 780.6123, 1924.8691, 862.80585, 728.4842, 1745.7917, 838.9295, 1272.9702]
2025-09-13 23:35:14,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [308.0, 371.0, 318.0, 313.0, 794.0, 326.0, 296.0, 785.0, 359.0, 440.0]
2025-09-13 23:35:14,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1084.77) for latency ExtremeSparseL4U32
2025-09-13 23:35:14,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 20 minutes, 46 seconds)
2025-09-13 23:46:25,756 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:46:25,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:48:10,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 942.59668 ± 324.413
2025-09-13 23:48:10,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [759.9756, 947.8416, 1124.3994, 1143.5237, 921.38116, 1350.9332, 1282.3889, 135.99194, 832.43726, 927.09326]
2025-09-13 23:48:10,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [274.0, 336.0, 404.0, 447.0, 350.0, 520.0, 471.0, 83.0, 309.0, 333.0]
2025-09-13 23:48:10,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 12 minutes, 17 seconds)
2025-09-13 23:58:50,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:58:50,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:00:27,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 837.46106 ± 371.647
2025-09-14 00:00:27,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1038.2439, 893.0029, 945.0286, 836.90094, 114.71896, 855.38116, 880.86774, 787.4161, 1619.2181, 403.83243]
2025-09-14 00:00:27,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [350.0, 296.0, 330.0, 284.0, 143.0, 305.0, 305.0, 274.0, 663.0, 234.0]
2025-09-14 00:00:27,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 59 minutes, 44 seconds)
2025-09-14 00:10:51,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:10:51,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:12:43,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1005.65637 ± 431.740
2025-09-14 00:12:43,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1008.00653, 1006.5145, 762.9209, 885.45184, 1857.5985, 922.7402, 285.96033, 1386.894, 1413.0293, 527.44696]
2025-09-14 00:12:43,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [364.0, 359.0, 252.0, 337.0, 703.0, 328.0, 181.0, 534.0, 497.0, 217.0]
2025-09-14 00:12:43,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 48 minutes, 15 seconds)
2025-09-14 00:23:29,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:23:29,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:25:33,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1224.10986 ± 531.879
2025-09-14 00:25:33,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [816.4679, 988.74475, 2158.6426, 147.84966, 885.60144, 1394.541, 1658.2156, 1519.3213, 1079.4542, 1592.2599]
2025-09-14 00:25:33,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 334.0, 695.0, 85.0, 321.0, 463.0, 510.0, 466.0, 383.0, 527.0]
2025-09-14 00:25:33,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1224.11) for latency ExtremeSparseL4U32
2025-09-14 00:25:33,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 41 minutes, 40 seconds)
2025-09-14 00:36:35,760 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:36:35,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:39:44,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1793.68298 ± 618.848
2025-09-14 00:39:44,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1674.9181, 2984.6277, 703.6129, 2325.0056, 1581.2767, 2055.1443, 1326.3649, 1270.6896, 1668.6128, 2346.5764]
2025-09-14 00:39:44,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [575.0, 1000.0, 262.0, 833.0, 557.0, 718.0, 479.0, 428.0, 559.0, 795.0]
2025-09-14 00:39:44,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1793.68) for latency ExtremeSparseL4U32
2025-09-14 00:39:44,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 50 minutes)
2025-09-14 00:50:48,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:50:48,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:53:11,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1427.01709 ± 605.115
2025-09-14 00:53:11,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1135.8965, 1636.5803, 1115.0364, 1589.0889, 1222.5828, 2098.2854, 2803.4602, 1174.2731, 879.73364, 615.2348]
2025-09-14 00:53:11,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [390.0, 517.0, 390.0, 497.0, 395.0, 724.0, 860.0, 406.0, 323.0, 239.0]
2025-09-14 00:53:11,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 44 minutes, 17 seconds)
2025-09-14 01:03:41,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:03:41,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:06:21,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1579.09265 ± 632.373
2025-09-14 01:06:21,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1615.6619, 1152.4504, 895.2397, 1347.9332, 2147.866, 884.46, 1168.5106, 2139.961, 2989.259, 1449.5847]
2025-09-14 01:06:21,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [539.0, 404.0, 301.0, 503.0, 684.0, 293.0, 407.0, 725.0, 1000.0, 473.0]
2025-09-14 01:06:22,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 43 minutes, 13 seconds)
2025-09-14 01:17:14,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:17:14,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:19:47,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1468.15759 ± 763.999
2025-09-14 01:19:47,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [121.9441, 2150.3152, 1012.73126, 2758.1506, 1085.9557, 971.6901, 1116.9102, 1214.3319, 2474.8938, 1774.6536]
2025-09-14 01:19:47,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [68.0, 736.0, 357.0, 876.0, 379.0, 337.0, 424.0, 404.0, 809.0, 628.0]
2025-09-14 01:19:47,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 45 minutes, 10 seconds)
2025-09-14 01:30:13,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:30:13,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:31:38,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 828.57922 ± 342.682
2025-09-14 01:31:38,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [863.7651, 865.9012, 21.278814, 680.1398, 767.2605, 1442.8344, 876.29425, 1164.7445, 794.0215, 809.5519]
2025-09-14 01:31:38,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [298.0, 305.0, 32.0, 229.0, 268.0, 439.0, 318.0, 383.0, 284.0, 287.0]
2025-09-14 01:31:38,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 19 minutes, 6 seconds)
2025-09-14 01:42:25,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:42:25,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:44:40,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1313.40637 ± 651.694
2025-09-14 01:44:40,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [943.37305, 1425.8862, 122.704666, 917.1206, 1094.7733, 1109.2195, 2516.5088, 2212.716, 1681.6707, 1110.0907]
2025-09-14 01:44:40,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [327.0, 491.0, 75.0, 314.0, 380.0, 391.0, 820.0, 738.0, 525.0, 376.0]
2025-09-14 01:44:40,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 51 minutes, 16 seconds)
2025-09-14 01:55:28,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:55:28,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:59:00,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2213.15063 ± 1051.571
2025-09-14 01:59:00,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2013.4374, 641.4594, 3136.8962, 3138.5337, 3477.0369, 879.675, 1060.5459, 1534.9385, 3205.5972, 3043.3865]
2025-09-14 01:59:00,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [670.0, 259.0, 1000.0, 1000.0, 1000.0, 307.0, 352.0, 482.0, 1000.0, 901.0]
2025-09-14 01:59:00,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2213.15) for latency ExtremeSparseL4U32
2025-09-14 01:59:00,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 49 minutes, 19 seconds)
2025-09-14 02:10:01,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:10:01,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:14:15,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2543.51904 ± 951.194
2025-09-14 02:14:15,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1637.8944, 230.44753, 3245.8833, 1800.1798, 2912.3845, 3078.8062, 3118.5208, 3038.1885, 3128.3176, 3244.5662]
2025-09-14 02:14:15,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [565.0, 109.0, 1000.0, 632.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 02:14:15,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2543.52) for latency ExtremeSparseL4U32
2025-09-14 02:14:15,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 1 minute, 46 seconds)
2025-09-14 02:25:11,222 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:25:11,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:29:15,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2320.92529 ± 812.801
2025-09-14 02:29:15,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3024.2012, 2275.9094, 3008.6936, 694.4601, 2815.4062, 2922.9302, 1222.7123, 1539.219, 2879.828, 2825.8953]
2025-09-14 02:29:15,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 787.0, 1000.0, 258.0, 1000.0, 1000.0, 419.0, 560.0, 1000.0, 1000.0]
2025-09-14 02:29:15,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 7 minutes, 30 seconds)
2025-09-14 02:39:29,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:39:29,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:43:13,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2346.21558 ± 828.105
2025-09-14 02:43:13,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3058.815, 3072.0295, 2527.0093, 1268.8109, 3137.3906, 3124.2388, 2370.833, 1014.5728, 1137.6464, 2750.8103]
2025-09-14 02:43:13,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 750.0, 414.0, 1000.0, 1000.0, 715.0, 350.0, 376.0, 850.0]
2025-09-14 02:43:13,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 19 minutes, 6 seconds)
2025-09-14 02:54:22,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:54:22,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:57:23,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1815.52307 ± 1176.184
2025-09-14 02:57:23,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3039.9077, 869.65515, 1362.139, 171.80247, 180.94843, 1616.821, 3041.9114, 3289.6003, 1371.4165, 3211.0298]
2025-09-14 02:57:23,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 314.0, 427.0, 97.0, 106.0, 523.0, 1000.0, 1000.0, 444.0, 1000.0]
2025-09-14 02:57:23,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 18 minutes, 5 seconds)
2025-09-14 03:08:48,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:08:48,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:13:41,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2975.47192 ± 379.624
2025-09-14 03:13:41,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3132.6074, 2974.0017, 3117.014, 3098.2192, 3054.4363, 3106.2927, 1848.9562, 3169.3284, 3074.468, 3179.3938]
2025-09-14 03:13:41,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 958.0, 1000.0, 1000.0, 1000.0, 673.0, 1000.0, 1000.0, 1000.0]
2025-09-14 03:13:41,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2975.47) for latency ExtremeSparseL4U32
2025-09-14 03:13:41,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 26 minutes, 11 seconds)
2025-09-14 03:23:47,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:23:47,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:28:14,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2733.86377 ± 618.380
2025-09-14 03:28:14,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3089.1277, 1926.5852, 1195.6031, 3111.606, 2746.0835, 3020.4119, 3090.0598, 2931.1638, 3110.5637, 3117.4329]
2025-09-14 03:28:14,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 622.0, 424.0, 1000.0, 872.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 03:28:14,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 3 minutes, 27 seconds)
2025-09-14 03:38:50,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:38:50,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:42:38,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2434.19971 ± 1209.938
2025-09-14 03:42:38,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [17.302149, 3313.9072, 940.279, 941.3474, 3414.3447, 2891.604, 3261.2224, 3124.132, 3253.1243, 3184.7324]
2025-09-14 03:42:38,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 1000.0, 317.0, 323.0, 1000.0, 1000.0, 1000.0, 1000.0, 970.0, 1000.0]
2025-09-14 03:42:38,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 41 minutes, 57 seconds)
2025-09-14 03:53:18,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:53:18,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:57:35,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2804.89917 ± 891.572
2025-09-14 03:57:35,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3317.128, 1110.5105, 1025.7798, 3244.6162, 2741.4233, 3243.0007, 3336.0994, 3561.2034, 3371.7415, 3097.4883]
2025-09-14 03:57:35,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 397.0, 359.0, 1000.0, 829.0, 1000.0, 1000.0, 1000.0, 1000.0, 971.0]
2025-09-14 03:57:35,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 37 minutes, 57 seconds)
2025-09-14 04:08:21,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:08:21,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:11:51,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2189.16016 ± 1013.197
2025-09-14 04:11:51,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2233.9868, 3135.4905, 2810.567, 801.10614, 1555.1915, 634.5364, 3194.9644, 3215.2686, 1113.6892, 3196.8013]
2025-09-14 04:11:51,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [782.0, 1000.0, 880.0, 294.0, 530.0, 274.0, 1000.0, 968.0, 395.0, 970.0]
2025-09-14 04:11:51,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 24 minutes, 13 seconds)
2025-09-14 04:23:09,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:23:09,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:28:07,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3020.50171 ± 160.226
2025-09-14 04:28:07,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3158.9138, 3014.7603, 3063.9363, 3127.7253, 3042.2722, 2594.8376, 3147.3176, 3150.3933, 2940.9253, 2963.9358]
2025-09-14 04:28:07,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 853.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 04:28:07,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3020.50) for latency ExtremeSparseL4U32
2025-09-14 04:28:07,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 9 minutes, 4 seconds)
2025-09-14 04:38:43,941 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:38:43,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:42:50,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2586.77637 ± 711.322
2025-09-14 04:42:50,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2009.1569, 3122.3557, 2848.6926, 2672.8755, 1519.3562, 3160.6792, 3377.8154, 1284.2136, 3341.138, 2531.481]
2025-09-14 04:42:50,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [629.0, 1000.0, 863.0, 821.0, 538.0, 1000.0, 1000.0, 429.0, 1000.0, 771.0]
2025-09-14 04:42:50,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 55 minutes, 47 seconds)
2025-09-14 04:53:45,884 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:53:45,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:58:27,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3118.90894 ± 517.411
2025-09-14 04:58:28,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1805.2568, 3320.121, 3436.72, 3379.0168, 2523.9111, 3036.377, 3319.66, 3447.3438, 3451.5862, 3469.097]
2025-09-14 04:58:28,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [562.0, 1000.0, 1000.0, 1000.0, 1000.0, 877.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 04:58:28,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3118.91) for latency ExtremeSparseL4U32
2025-09-14 04:58:28,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 53 minutes, 21 seconds)
2025-09-14 05:09:26,004 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:09:26,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:13:09,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2389.15381 ± 1228.771
2025-09-14 05:13:09,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3301.7441, 3358.416, 2950.208, 3332.1719, 602.7229, 1222.6315, 3372.3137, 2357.361, 3376.9404, 17.02707]
2025-09-14 05:13:09,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 255.0, 427.0, 1000.0, 714.0, 1000.0, 29.0]
2025-09-14 05:13:09,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 35 minutes, 43 seconds)
2025-09-14 05:23:32,033 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:23:32,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:27:35,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2623.28052 ± 872.463
2025-09-14 05:27:35,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3349.4011, 2864.9783, 3339.7566, 1620.7357, 3292.6475, 3319.7527, 3103.555, 2610.3647, 597.2436, 2134.3687]
2025-09-14 05:27:35,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 814.0, 1000.0, 544.0, 1000.0, 1000.0, 909.0, 822.0, 250.0, 645.0]
2025-09-14 05:27:35,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 22 minutes, 4 seconds)
2025-09-14 05:38:28,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:38:28,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:43:21,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3226.94336 ± 186.236
2025-09-14 05:43:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3294.6807, 3362.9487, 3290.9207, 3352.0356, 3019.618, 3289.1985, 3352.509, 2761.572, 3171.539, 3374.4106]
2025-09-14 05:43:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 829.0, 1000.0, 1000.0]
2025-09-14 05:43:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3226.94) for latency ExtremeSparseL4U32
2025-09-14 05:43:21,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 2 minutes, 16 seconds)
2025-09-14 05:54:30,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:54:30,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:58:28,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2578.22559 ± 1082.807
2025-09-14 05:58:29,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3463.0488, 2041.6293, 2587.6682, 3258.0715, 160.29579, 1124.4294, 2880.471, 3412.9338, 3467.097, 3386.6118]
2025-09-14 05:58:29,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [998.0, 650.0, 789.0, 1000.0, 87.0, 369.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 05:58:29,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 51 minutes, 5 seconds)
2025-09-14 06:08:40,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:08:40,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:13:02,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2849.99951 ± 986.650
2025-09-14 06:13:02,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3129.4304, 2569.6987, 2482.135, 3457.6897, 3363.1228, 62.858204, 3412.175, 3347.872, 3375.1082, 3299.9058]
2025-09-14 06:13:02,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 767.0, 749.0, 1000.0, 1000.0, 106.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 06:13:02,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 26 minutes, 2 seconds)
2025-09-14 06:24:26,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:24:26,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:28:46,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2935.79883 ± 957.532
2025-09-14 06:28:46,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3390.095, 3355.405, 569.14056, 1585.8683, 3335.8145, 3398.6152, 3492.4014, 3366.8518, 3451.594, 3412.203]
2025-09-14 06:28:46,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 217.0, 503.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 06:28:46,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 20 minutes, 29 seconds)
2025-09-14 06:39:21,785 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:39:21,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:43:52,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2987.27197 ± 556.449
2025-09-14 06:43:52,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3460.0454, 3313.029, 3404.1243, 1703.5653, 3411.6482, 3393.0767, 2424.2646, 2579.104, 3309.645, 2874.216]
2025-09-14 06:43:52,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 562.0, 1000.0, 1000.0, 689.0, 742.0, 1000.0, 846.0]
2025-09-14 06:43:52,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 11 minutes, 23 seconds)
2025-09-14 06:54:23,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:54:23,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:58:06,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2466.61792 ± 1335.566
2025-09-14 06:58:06,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3266.9883, 810.7649, 3491.8967, 3405.91, 3264.9607, 3324.751, 280.22598, 249.2495, 3052.4526, 3518.9797]
2025-09-14 06:58:06,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [973.0, 278.0, 1000.0, 1000.0, 1000.0, 1000.0, 136.0, 137.0, 892.0, 1000.0]
2025-09-14 06:58:06,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 42 minutes, 50 seconds)
2025-09-14 07:08:48,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:08:48,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:12:41,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2540.58936 ± 1035.013
2025-09-14 07:12:41,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2019.255, 2731.6753, 3070.733, 1513.3788, 3488.1038, 19.241934, 3235.3394, 3259.9639, 3437.6753, 2630.5276]
2025-09-14 07:12:41,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [599.0, 804.0, 1000.0, 456.0, 1000.0, 29.0, 1000.0, 932.0, 1000.0, 798.0]
2025-09-14 07:12:41,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 23 minutes, 22 seconds)
2025-09-14 07:24:18,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:24:18,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:28:48,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2925.42163 ± 676.262
2025-09-14 07:28:48,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2406.7947, 2310.246, 3279.9304, 3136.445, 3271.209, 3366.5247, 1269.8744, 3471.977, 3423.4907, 3317.7263]
2025-09-14 07:28:48,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [773.0, 713.0, 1000.0, 1000.0, 1000.0, 1000.0, 410.0, 1000.0, 1000.0, 1000.0]
2025-09-14 07:28:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 21 minutes, 22 seconds)
2025-09-14 07:39:40,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:39:40,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:44:22,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3175.80811 ± 747.442
2025-09-14 07:44:22,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3451.3372, 3531.32, 3478.6133, 949.04034, 3526.1626, 3504.1375, 3290.828, 3315.464, 3411.5913, 3299.5881]
2025-09-14 07:44:22,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 404.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 07:44:22,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 4 minutes, 46 seconds)
2025-09-14 07:55:06,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:55:06,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:59:09,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2757.93555 ± 1069.840
2025-09-14 07:59:09,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3461.6562, 1921.5369, 3334.8718, 3519.122, 3415.7312, 3545.6155, 544.2795, 1124.3811, 3379.2817, 3332.8796]
2025-09-14 07:59:09,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 574.0, 1000.0, 1000.0, 1000.0, 1000.0, 208.0, 373.0, 1000.0, 1000.0]
2025-09-14 07:59:09,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 47 minutes, 7 seconds)
2025-09-14 08:09:05,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:09:05,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:13:03,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2699.97437 ± 1032.387
2025-09-14 08:13:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3530.8455, 788.47253, 3543.64, 3454.9531, 3225.8997, 1696.4283, 3465.1438, 3525.8982, 2642.5415, 1125.9214]
2025-09-14 08:13:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 273.0, 1000.0, 1000.0, 1000.0, 514.0, 1000.0, 1000.0, 762.0, 352.0]
2025-09-14 08:13:04,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 29 minutes, 38 seconds)
2025-09-14 08:23:59,144 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:23:59,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:27:15,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2123.62256 ± 1283.237
2025-09-14 08:27:15,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3498.7832, 1614.7888, 2739.1658, 3387.8818, 15.264294, 251.17607, 1053.0327, 1910.5856, 3496.9165, 3268.6316]
2025-09-14 08:27:15,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 510.0, 807.0, 1000.0, 26.0, 125.0, 335.0, 624.0, 1000.0, 1000.0]
2025-09-14 08:27:15,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 11 minutes, 50 seconds)
2025-09-14 08:38:09,145 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:38:09,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:42:06,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2680.99048 ± 956.015
2025-09-14 08:42:06,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3466.7969, 3392.4875, 3190.5347, 1919.6077, 3472.3486, 2611.448, 979.8807, 3408.1448, 1037.6754, 3330.979]
2025-09-14 08:42:06,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 576.0, 1000.0, 770.0, 359.0, 1000.0, 354.0, 1000.0]
2025-09-14 08:42:06,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 47 minutes, 39 seconds)
2025-09-14 08:53:34,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:53:34,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:58:03,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3097.33325 ± 1027.828
2025-09-14 08:58:03,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [20.325344, 3547.756, 3420.7593, 3473.5237, 3486.1926, 3388.1038, 3505.4329, 3402.8394, 3296.2217, 3432.1748]
2025-09-14 08:58:03,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 08:58:03,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 35 minutes, 49 seconds)
2025-09-14 09:08:18,902 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:08:18,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:12:47,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3055.33594 ± 748.575
2025-09-14 09:12:47,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3417.0261, 2960.4363, 3421.7632, 3345.1082, 2567.6555, 3424.4158, 974.9153, 3500.248, 3465.7358, 3476.0537]
2025-09-14 09:12:47,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 858.0, 1000.0, 1000.0, 750.0, 1000.0, 328.0, 1000.0, 1000.0, 1000.0]
2025-09-14 09:12:47,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 20 minutes, 42 seconds)
2025-09-14 09:23:18,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:23:18,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:27:18,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2557.32178 ± 1180.340
2025-09-14 09:27:18,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3440.2356, 217.01205, 3039.1833, 3346.9104, 3525.1978, 1412.7557, 807.3613, 3170.6204, 3234.3623, 3379.5789]
2025-09-14 09:27:18,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 113.0, 981.0, 1000.0, 1000.0, 520.0, 300.0, 1000.0, 1000.0, 1000.0]
2025-09-14 09:27:18,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 10 minutes, 1 second)
2025-09-14 09:38:27,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:38:27,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:42:42,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2936.45996 ± 1035.353
2025-09-14 09:42:42,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3523.675, 3427.4204, 3491.233, 1301.9558, 505.48196, 3216.2107, 3493.5505, 3399.7017, 3491.7896, 3513.5818]
2025-09-14 09:42:42,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 421.0, 193.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 09:42:42,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 2 minutes, 48 seconds)
2025-09-14 09:52:55,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:52:55,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:57:32,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3265.10278 ± 376.369
2025-09-14 09:57:32,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3597.7385, 3591.2832, 3573.4133, 2882.2449, 3466.6409, 3245.3403, 2401.4897, 3416.1694, 2967.4568, 3509.2495]
2025-09-14 09:57:32,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 837.0, 955.0, 1000.0, 685.0, 1000.0, 825.0, 1000.0]
2025-09-14 09:57:32,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3265.10) for latency ExtremeSparseL4U32
2025-09-14 09:57:32,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 47 minutes, 46 seconds)
2025-09-14 10:07:54,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:07:54,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:12:18,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2940.93408 ± 793.114
2025-09-14 10:12:18,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3189.2722, 3548.1453, 1602.3851, 3599.7703, 3338.197, 2545.1318, 3485.628, 3401.917, 1322.1332, 3376.763]
2025-09-14 10:12:18,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 502.0, 1000.0, 1000.0, 746.0, 1000.0, 1000.0, 453.0, 1000.0]
2025-09-14 10:12:18,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 25 minutes, 28 seconds)
2025-09-14 10:22:59,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:22:59,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:26:50,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2729.84326 ± 1200.318
2025-09-14 10:26:50,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3536.3306, 1446.089, 18.173004, 3658.2173, 3534.6663, 3643.705, 3557.5107, 3585.2952, 2573.4792, 1744.9681]
2025-09-14 10:26:50,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 433.0, 30.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 736.0, 539.0]
2025-09-14 10:26:50,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 9 minutes, 32 seconds)
2025-09-14 10:38:25,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:38:25,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:41:56,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2442.81104 ± 1370.049
2025-09-14 10:41:56,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3597.1511, 3467.7942, 3568.3635, 3556.0696, 3093.6448, 166.70517, 1767.468, 19.617113, 3570.875, 1620.422]
2025-09-14 10:41:56,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 888.0, 83.0, 524.0, 30.0, 1000.0, 466.0]
2025-09-14 10:41:56,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 57 minutes, 53 seconds)
2025-09-14 10:52:43,011 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:52:43,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:57:16,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3165.72021 ± 772.381
2025-09-14 10:57:16,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3553.2725, 3374.222, 3596.9075, 3562.217, 3676.3918, 1266.0574, 3461.4646, 2078.7512, 3590.4685, 3497.449]
2025-09-14 10:57:16,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 422.0, 1000.0, 624.0, 1000.0, 1000.0]
2025-09-14 10:57:16,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 42 minutes, 39 seconds)
2025-09-14 11:07:13,431 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:07:13,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:11:03,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2627.08521 ± 1367.353
2025-09-14 11:11:03,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3474.7717, 3543.5398, 3567.3848, 3432.092, 15.072517, 1001.3956, 3602.3325, 688.55884, 3453.2075, 3492.4966]
2025-09-14 11:11:03,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 964.0, 30.0, 326.0, 1000.0, 242.0, 1000.0, 1000.0]
2025-09-14 11:11:03,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 22 minutes, 13 seconds)
2025-09-14 11:21:50,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:21:50,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:25:45,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2609.92725 ± 1076.079
2025-09-14 11:25:45,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3543.0159, 3513.1526, 1473.1208, 665.0614, 3200.2385, 1726.9874, 3404.3193, 3569.8262, 3529.944, 1473.6077]
2025-09-14 11:25:45,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 485.0, 261.0, 1000.0, 535.0, 1000.0, 1000.0, 1000.0, 494.0]
2025-09-14 11:25:45,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 7 minutes, 14 seconds)
2025-09-14 11:36:40,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:36:40,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:40:22,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2500.26709 ± 1170.728
2025-09-14 11:40:22,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3459.2117, 3315.9797, 110.94645, 3416.8735, 3409.2698, 2889.489, 3491.204, 2522.2568, 879.3034, 1508.1382]
2025-09-14 11:40:22,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 68.0, 1000.0, 1000.0, 844.0, 1000.0, 739.0, 287.0, 465.0]
2025-09-14 11:40:22,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 52 minutes, 58 seconds)
2025-09-14 11:51:06,269 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:51:06,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:55:25,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3051.15088 ± 888.852
2025-09-14 11:55:25,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3441.4954, 3591.8516, 3499.5227, 3479.0076, 2416.3381, 3537.2473, 3464.666, 2824.8743, 3629.3867, 627.12036]
2025-09-14 11:55:25,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [972.0, 1000.0, 1000.0, 1000.0, 712.0, 1000.0, 1000.0, 807.0, 1000.0, 255.0]
2025-09-14 11:55:25,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 38 minutes, 2 seconds)
2025-09-14 12:05:47,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:05:47,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:10:39,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3426.36206 ± 259.254
2025-09-14 12:10:39,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3482.357, 2663.7212, 3613.531, 3559.8103, 3450.1138, 3561.8706, 3507.605, 3501.5032, 3482.5042, 3440.604]
2025-09-14 12:10:39,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 791.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 12:10:39,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3426.36) for latency ExtremeSparseL4U32
2025-09-14 12:10:39,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 22 minutes, 55 seconds)
2025-09-14 12:22:03,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:22:03,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:25:58,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2619.76855 ± 1078.625
2025-09-14 12:25:58,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3519.6704, 3518.248, 2034.4857, 1660.2207, 3188.8674, 3390.2507, 3686.4714, 36.72022, 2689.746, 2473.0059]
2025-09-14 12:25:58,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 590.0, 532.0, 911.0, 1000.0, 1000.0, 52.0, 1000.0, 690.0]
2025-09-14 12:25:58,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 14 minutes, 38 seconds)
2025-09-14 12:36:18,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:36:18,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:41:09,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3354.40869 ± 432.163
2025-09-14 12:41:09,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3470.8916, 3501.6995, 3586.241, 3270.2686, 3540.2031, 3433.331, 3544.5337, 3487.9956, 2087.3489, 3621.573]
2025-09-14 12:41:09,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 617.0, 1000.0]
2025-09-14 12:41:09,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 1 minute, 37 seconds)
2025-09-14 12:51:30,560 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:51:30,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:56:14,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3162.60156 ± 577.112
2025-09-14 12:56:14,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3466.8909, 3474.3105, 3533.2458, 3343.3035, 3337.0044, 3424.3906, 3478.367, 2879.8372, 3166.847, 1521.818]
2025-09-14 12:56:14,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 815.0, 1000.0, 523.0]
2025-09-14 12:56:14,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 48 minutes, 15 seconds)
2025-09-14 13:07:48,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:07:48,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:11:47,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2761.71729 ± 909.909
2025-09-14 13:11:47,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2889.8906, 1754.4785, 3475.2722, 3163.2737, 3506.5063, 3618.8235, 3059.199, 808.70984, 3497.9365, 1843.0825]
2025-09-14 13:11:47,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [800.0, 549.0, 1000.0, 1000.0, 1000.0, 1000.0, 850.0, 300.0, 1000.0, 550.0]
2025-09-14 13:11:47,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 34 minutes, 55 seconds)
2025-09-14 13:21:38,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:21:38,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:25:54,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2922.72241 ± 987.989
2025-09-14 13:25:54,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3411.352, 1176.7766, 3489.0183, 3474.5005, 3534.5725, 2700.9094, 3530.2427, 3495.914, 846.2369, 3567.7031]
2025-09-14 13:25:54,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 371.0, 1000.0, 1000.0, 1000.0, 778.0, 1000.0, 1000.0, 321.0, 1000.0]
2025-09-14 13:25:54,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 15 minutes, 48 seconds)
2025-09-14 13:36:42,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:36:42,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:39:53,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2169.43262 ± 1140.778
2025-09-14 13:39:53,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1237.1208, 1816.1112, 3353.9175, 3098.1865, 2523.823, 996.44824, 1693.1235, 3509.0884, 3443.6025, 22.9046]
2025-09-14 13:39:53,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [385.0, 556.0, 1000.0, 905.0, 726.0, 326.0, 519.0, 1000.0, 1000.0, 31.0]
2025-09-14 13:39:53,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 56 minutes, 31 seconds)
2025-09-14 13:50:19,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:50:19,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:54:42,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3054.00659 ± 725.634
2025-09-14 13:54:42,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1454.8356, 2491.3394, 3588.9592, 3459.9502, 3548.2583, 3463.4993, 3376.2146, 2078.1157, 3516.2012, 3562.6934]
2025-09-14 13:54:42,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [436.0, 773.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 624.0, 1000.0, 1000.0]
2025-09-14 13:54:42,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 40 minutes, 39 seconds)
2025-09-14 14:05:43,692 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:05:43,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:10:25,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3277.21558 ± 793.090
2025-09-14 14:10:25,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3581.6318, 3511.8196, 3547.0303, 3552.733, 3473.5618, 3492.7769, 3529.6396, 3634.3027, 3547.2312, 901.4316]
2025-09-14 14:10:25,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 312.0]
2025-09-14 14:10:25,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 27 minutes, 43 seconds)
2025-09-14 14:20:47,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:20:47,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:25:15,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3029.70703 ± 824.094
2025-09-14 14:25:15,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2925.6257, 3230.1362, 3517.0283, 3472.6445, 844.4869, 3580.0193, 3496.4177, 3426.8408, 3542.4126, 2261.46]
2025-09-14 14:25:15,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [826.0, 1000.0, 1000.0, 1000.0, 299.0, 1000.0, 1000.0, 1000.0, 1000.0, 641.0]
2025-09-14 14:25:15,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 11 minutes, 1 second)
2025-09-14 14:36:11,475 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:36:11,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:40:25,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3018.49878 ± 1120.166
2025-09-14 14:40:25,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3627.153, 3515.4954, 3639.11, 3646.8865, 3646.549, 749.5923, 3435.122, 3530.9941, 3579.4397, 814.64667]
2025-09-14 14:40:25,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 254.0, 1000.0, 1000.0, 1000.0, 277.0]
2025-09-14 14:40:25,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 58 minutes, 51 seconds)
2025-09-14 14:50:57,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:50:57,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:55:17,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3109.74951 ± 935.829
2025-09-14 14:55:17,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3516.383, 3604.354, 3430.1194, 3539.723, 3425.4866, 2613.8225, 3594.3447, 426.97083, 3569.022, 3377.27]
2025-09-14 14:55:17,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 714.0, 1000.0, 177.0, 1000.0, 931.0]
2025-09-14 14:55:17,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 45 minutes, 54 seconds)
2025-09-14 15:06:41,518 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 15:06:41,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 15:11:28,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3418.91528 ± 396.095
2025-09-14 15:11:28,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3581.8428, 3574.9116, 3598.2534, 2255.216, 3571.7515, 3485.55, 3345.633, 3574.8845, 3536.5547, 3664.5535]
2025-09-14 15:11:28,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 665.0, 1000.0, 1000.0, 937.0, 1000.0, 1000.0, 1000.0]
2025-09-14 15:11:28,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 33 minutes, 31 seconds)
2025-09-14 15:21:52,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 15:21:52,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 15:26:45,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3530.97583 ± 218.427
2025-09-14 15:26:45,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3579.6084, 3612.376, 3584.8235, 3610.4534, 2886.2908, 3617.6206, 3624.9763, 3690.8906, 3535.0918, 3567.6252]
2025-09-14 15:26:45,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 848.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 15:26:45,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3530.98) for latency ExtremeSparseL4U32
2025-09-14 15:26:45,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 17 minutes, 23 seconds)
2025-09-14 15:36:48,664 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 15:36:48,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 15:41:52,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3576.84253 ± 42.480
2025-09-14 15:41:52,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3507.287, 3603.4446, 3570.5752, 3543.0972, 3561.0076, 3539.6616, 3597.2751, 3661.3232, 3563.6428, 3621.113]
2025-09-14 15:41:52,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 15:41:52,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3576.84) for latency ExtremeSparseL4U32
2025-09-14 15:41:52,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 2 minutes, 35 seconds)
2025-09-14 15:52:42,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 15:52:42,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 15:57:35,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3479.74658 ± 290.440
2025-09-14 15:57:35,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3654.2966, 3617.6382, 3499.8896, 3459.7039, 3615.6084, 3559.8835, 3537.9702, 3688.3801, 3532.121, 2631.9722]
2025-09-14 15:57:35,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 744.0]
2025-09-14 15:57:35,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 48 minutes)
2025-09-14 16:08:07,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 16:08:07,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 16:11:23,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2211.38818 ± 1327.869
2025-09-14 16:11:23,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3476.7725, 364.54037, 2268.3728, 235.72636, 3510.1626, 1282.9878, 3486.0432, 3534.7605, 3134.7104, 819.8046]
2025-09-14 16:11:23,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 159.0, 677.0, 127.0, 1000.0, 393.0, 1000.0, 1000.0, 936.0, 313.0]
2025-09-14 16:11:23,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 31 minutes, 19 seconds)
2025-09-14 16:22:19,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 16:22:19,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 16:27:00,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3468.33154 ± 523.949
2025-09-14 16:27:00,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3652.9014, 3674.72, 3606.246, 3601.9011, 3623.2988, 3646.5454, 3664.847, 3604.4282, 3708.8376, 1899.5931]
2025-09-14 16:27:00,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [985.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 574.0]
2025-09-14 16:27:00,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 15 minutes, 32 seconds)
2025-09-14 16:37:40,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 16:37:40,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 16:42:10,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3165.95312 ± 848.645
2025-09-14 16:42:10,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3547.9424, 738.20294, 3474.1484, 3496.315, 2684.0056, 3557.0898, 3561.2817, 3542.7898, 3603.0408, 3454.712]
2025-09-14 16:42:10,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 281.0, 966.0, 1000.0, 730.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 16:42:10,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 19 seconds)
2025-09-14 16:52:18,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 16:52:18,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 16:57:05,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3399.76514 ± 560.762
2025-09-14 16:57:05,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3630.5574, 3492.52, 3645.4814, 1726.3833, 3543.9856, 3484.622, 3634.8586, 3574.0566, 3629.2942, 3635.891]
2025-09-14 16:57:05,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 488.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 16:57:05,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 45 minutes, 7 seconds)
2025-09-14 17:07:57,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 17:07:57,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 17:12:40,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3305.26367 ± 590.633
2025-09-14 17:12:40,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1573.6636, 3441.0955, 3386.892, 3544.8313, 3205.086, 3663.047, 3577.955, 3580.39, 3608.3042, 3471.3706]
2025-09-14 17:12:40,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [499.0, 949.0, 1000.0, 1000.0, 892.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 17:12:40,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 2 seconds)
2025-09-14 17:23:48,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 17:23:48,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 17:28:31,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3483.17236 ± 520.311
2025-09-14 17:28:31,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3715.429, 3672.9053, 3750.6167, 3575.7178, 3691.3767, 2047.77, 3640.4546, 3854.77, 3806.8196, 3075.8645]
2025-09-14 17:28:31,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 986.0, 1000.0, 589.0, 1000.0, 1000.0, 1000.0, 798.0]
2025-09-14 17:28:31,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 25 seconds)
2025-09-14 17:39:10,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 17:39:10,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 17:43:53,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3379.30029 ± 812.668
2025-09-14 17:43:53,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3613.958, 3727.8752, 3678.9119, 3673.6008, 3614.8633, 3416.4978, 3690.2517, 954.9803, 3724.3906, 3697.673]
2025-09-14 17:43:53,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 305.0, 1000.0, 1000.0]
2025-09-14 17:43:53,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1251 [DEBUG]: Training session finished
