2025-09-12 03:04:42,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc10-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:04:42,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc10-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:04:42,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x150f63719390>}
2025-09-12 03:04:42,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1111 [DEBUG]: using device: cuda
2025-09-12 03:04:42,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1133 [INFO]: Creating new trainer
2025-09-12 03:04:42,323 baseline-mbpac-noiseperc10-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 03:04:42,323 baseline-mbpac-noiseperc10-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 03:04:42,332 baseline-mbpac-noiseperc10-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 03:04:43,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1194 [DEBUG]: Starting training session...
2025-09-12 03:04:43,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 1/100
2025-09-12 03:15:05,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:15:05,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:15:43,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 19.28110 ± 79.333
2025-09-12 03:15:43,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [86.288635, -13.148372, 6.210273, 235.51024, -3.087309, -6.1738253, -17.904833, -21.195229, -36.29524, -37.393375]
2025-09-12 03:15:43,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [284.0, 121.0, 121.0, 156.0, 113.0, 118.0, 95.0, 120.0, 117.0, 101.0]
2025-09-12 03:15:43,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (19.28) for latency MM1Queue_a033_s075
2025-09-12 03:15:43,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 8 minutes, 52 seconds)
2025-09-12 03:27:39,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:27:39,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:28:27,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 123.19711 ± 164.885
2025-09-12 03:28:27,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [87.02537, 15.150864, 12.870577, 106.234314, 43.94943, 13.952675, 329.7806, 89.326584, 534.34906, -0.6682761]
2025-09-12 03:28:27,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 90.0, 92.0, 235.0, 171.0, 24.0, 185.0, 151.0, 417.0, 141.0]
2025-09-12 03:28:27,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (123.20) for latency MM1Queue_a033_s075
2025-09-12 03:28:27,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 23 minutes, 20 seconds)
2025-09-12 03:40:05,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:40:05,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:41:11,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 279.45584 ± 227.062
2025-09-12 03:41:11,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [915.5642, 49.300415, 146.89157, 251.14572, 379.01077, 166.66846, 176.34982, 251.4389, 213.7214, 244.4673]
2025-09-12 03:41:11,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [857.0, 161.0, 104.0, 158.0, 349.0, 108.0, 163.0, 162.0, 137.0, 173.0]
2025-09-12 03:41:11,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (279.46) for latency MM1Queue_a033_s075
2025-09-12 03:41:11,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 39 minutes, 20 seconds)
2025-09-12 03:53:59,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:53:59,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:54:58,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 249.77713 ± 152.174
2025-09-12 03:54:58,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [494.08627, 293.63095, 261.69183, 68.03064, 174.12086, 503.04874, 194.11751, 113.3816, 56.735912, 338.9269]
2025-09-12 03:54:58,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [321.0, 175.0, 185.0, 99.0, 276.0, 317.0, 304.0, 145.0, 133.0, 186.0]
2025-09-12 03:54:58,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 6 minutes, 15 seconds)
2025-09-12 04:06:16,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:06:16,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:35,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 441.62491 ± 185.118
2025-09-12 04:07:35,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [508.57092, 818.87854, 454.24402, 494.93463, 197.1036, 460.1034, 325.32904, 435.5706, 130.41795, 591.09595]
2025-09-12 04:07:35,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [276.0, 640.0, 277.0, 276.0, 237.0, 255.0, 171.0, 221.0, 131.0, 360.0]
2025-09-12 04:07:35,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (441.62) for latency MM1Queue_a033_s075
2025-09-12 04:07:35,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 54 minutes, 41 seconds)
2025-09-12 04:18:49,467 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:18:49,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:19:41,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 242.21870 ± 151.505
2025-09-12 04:19:41,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [331.97293, 122.451416, 449.3991, 1.962412, 247.2054, 358.3123, 66.15368, 410.3472, 89.77305, 344.6096]
2025-09-12 04:19:41,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [187.0, 168.0, 250.0, 11.0, 326.0, 196.0, 107.0, 234.0, 117.0, 251.0]
2025-09-12 04:19:41,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 2 minutes, 42 seconds)
2025-09-12 04:31:35,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:31:35,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:32:43,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 387.18283 ± 84.610
2025-09-12 04:32:43,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [329.73148, 493.84216, 197.5686, 367.36942, 475.71884, 393.59702, 363.25473, 381.24966, 373.29492, 496.20148]
2025-09-12 04:32:43,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 270.0, 130.0, 201.0, 367.0, 215.0, 218.0, 226.0, 275.0, 352.0]
2025-09-12 04:32:43,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 55 minutes, 12 seconds)
2025-09-12 04:44:32,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:44:32,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:45:27,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 272.70453 ± 114.226
2025-09-12 04:45:27,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [332.93665, 281.66876, 109.08308, 369.94922, 382.63797, 261.86993, 282.7847, 172.82971, 451.2864, 81.998695]
2025-09-12 04:45:27,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [361.0, 175.0, 165.0, 203.0, 231.0, 156.0, 230.0, 108.0, 265.0, 101.0]
2025-09-12 04:45:27,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 42 minutes, 33 seconds)
2025-09-12 04:57:19,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:57:19,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:58:30,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 389.55426 ± 145.023
2025-09-12 04:58:30,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [254.55168, 226.64201, 522.6404, 402.56442, 621.1879, 325.61346, 189.42688, 307.49478, 460.74094, 584.68005]
2025-09-12 04:58:30,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 147.0, 320.0, 252.0, 397.0, 189.0, 140.0, 223.0, 275.0, 437.0]
2025-09-12 04:58:30,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 16 minutes, 7 seconds)
2025-09-12 05:10:17,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:10:17,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:11:19,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 361.44077 ± 157.156
2025-09-12 05:11:19,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [66.68367, 466.37775, 375.7345, 439.88544, 629.96594, 382.72446, 454.1213, 119.2197, 371.27637, 308.41846]
2025-09-12 05:11:19,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 247.0, 191.0, 225.0, 360.0, 217.0, 282.0, 150.0, 261.0, 180.0]
2025-09-12 05:11:19,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 7 minutes, 5 seconds)
2025-09-12 05:23:12,329 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:23:12,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:24:09,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 338.39603 ± 59.010
2025-09-12 05:24:09,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [366.70972, 455.63705, 293.67862, 325.46432, 301.91434, 365.8117, 231.53352, 356.83005, 389.12778, 297.25317]
2025-09-12 05:24:09,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [261.0, 253.0, 211.0, 196.0, 201.0, 248.0, 137.0, 167.0, 213.0, 165.0]
2025-09-12 05:24:09,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 7 minutes, 34 seconds)
2025-09-12 05:35:59,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:35:59,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:36:50,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 335.74008 ± 95.686
2025-09-12 05:36:50,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [398.4603, 461.33386, 358.99997, 275.25098, 394.4216, 414.20044, 271.97992, 284.89612, 383.70367, 114.154076]
2025-09-12 05:36:50,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 264.0, 174.0, 135.0, 204.0, 190.0, 145.0, 165.0, 203.0, 146.0]
2025-09-12 05:36:50,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 48 minutes, 23 seconds)
2025-09-12 05:48:48,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:48:48,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:49:42,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 299.46005 ± 135.064
2025-09-12 05:49:42,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [73.37938, 403.91745, 235.74861, 151.36, 418.52148, 237.52777, 554.5484, 381.21674, 231.72289, 306.65784]
2025-09-12 05:49:42,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 242.0, 141.0, 188.0, 348.0, 117.0, 282.0, 179.0, 117.0, 158.0]
2025-09-12 05:49:43,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 37 minutes, 59 seconds)
2025-09-12 06:01:43,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:01:43,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:02:38,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 412.96094 ± 187.727
2025-09-12 06:02:38,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [763.14343, 299.7758, 639.31213, 365.5612, 39.67048, 367.29874, 360.57925, 328.4745, 517.5319, 448.26187]
2025-09-12 06:02:38,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [318.0, 143.0, 301.0, 194.0, 53.0, 175.0, 183.0, 185.0, 230.0, 196.0]
2025-09-12 06:02:38,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 23 minutes, 13 seconds)
2025-09-12 06:14:22,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:14:22,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:15:26,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 456.88223 ± 175.719
2025-09-12 06:15:26,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [370.15207, 50.316395, 686.1192, 489.97028, 461.16135, 345.46686, 537.79144, 685.4888, 398.16415, 544.1919]
2025-09-12 06:15:26,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 70.0, 276.0, 258.0, 215.0, 208.0, 243.0, 303.0, 157.0, 308.0]
2025-09-12 06:15:26,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (456.88) for latency MM1Queue_a033_s075
2025-09-12 06:15:26,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 9 minutes, 53 seconds)
2025-09-12 06:27:09,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:27:09,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:28:11,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 486.00449 ± 98.233
2025-09-12 06:28:11,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [464.10083, 386.80408, 578.1613, 555.3386, 483.79688, 415.20837, 439.6571, 717.542, 394.5002, 424.9354]
2025-09-12 06:28:11,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [228.0, 162.0, 299.0, 214.0, 177.0, 223.0, 212.0, 294.0, 185.0, 204.0]
2025-09-12 06:28:11,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (486.00) for latency MM1Queue_a033_s075
2025-09-12 06:28:11,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 55 minutes, 38 seconds)
2025-09-12 06:40:05,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:40:05,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:41:20,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 613.33905 ± 181.756
2025-09-12 06:41:20,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [474.428, 853.6824, 489.18198, 703.55054, 836.2913, 791.09, 476.51917, 304.9406, 732.89044, 470.81656]
2025-09-12 06:41:20,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 417.0, 225.0, 264.0, 353.0, 314.0, 213.0, 176.0, 338.0, 177.0]
2025-09-12 06:41:20,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (613.34) for latency MM1Queue_a033_s075
2025-09-12 06:41:20,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 50 minutes, 47 seconds)
2025-09-12 06:53:05,330 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:53:05,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:54:41,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 782.52942 ± 204.847
2025-09-12 06:54:41,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [994.28345, 763.3291, 996.6243, 973.30493, 713.09094, 868.70105, 697.18317, 940.99896, 363.52457, 514.254]
2025-09-12 06:54:41,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [494.0, 312.0, 583.0, 395.0, 317.0, 347.0, 298.0, 374.0, 136.0, 218.0]
2025-09-12 06:54:41,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (782.53) for latency MM1Queue_a033_s075
2025-09-12 06:54:41,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 45 minutes, 42 seconds)
2025-09-12 07:06:31,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:06:31,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:07:54,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 733.84937 ± 158.809
2025-09-12 07:07:54,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [993.9671, 840.33093, 789.03217, 675.491, 357.7399, 640.6476, 707.1408, 723.8879, 752.06104, 858.1951]
2025-09-12 07:07:54,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [390.0, 332.0, 300.0, 241.0, 140.0, 270.0, 310.0, 297.0, 338.0, 353.0]
2025-09-12 07:07:54,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 37 minutes, 8 seconds)
2025-09-12 07:19:43,647 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:43,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:20:40,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 512.66516 ± 139.878
2025-09-12 07:20:40,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [578.0627, 629.86255, 645.3001, 570.2382, 504.82007, 660.9005, 276.2068, 595.0972, 271.86017, 394.30304]
2025-09-12 07:20:40,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [221.0, 234.0, 245.0, 226.0, 211.0, 250.0, 114.0, 226.0, 118.0, 161.0]
2025-09-12 07:20:40,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 23 minutes, 43 seconds)
2025-09-12 07:32:37,209 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:32:37,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:33:37,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 551.02301 ± 206.614
2025-09-12 07:33:37,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [711.1172, 774.77014, 416.95886, 238.80533, 722.0465, 371.68344, 704.8941, 414.00385, 322.60263, 833.3482]
2025-09-12 07:33:37,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 295.0, 159.0, 122.0, 272.0, 158.0, 284.0, 160.0, 151.0, 313.0]
2025-09-12 07:33:37,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 13 minutes, 58 seconds)
2025-09-12 07:45:23,861 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:23,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:46:43,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 703.39801 ± 318.169
2025-09-12 07:46:43,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [837.1641, 272.92477, 726.9555, 739.0068, 735.7318, 851.8604, 663.694, 253.21588, 1440.697, 512.729]
2025-09-12 07:46:43,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 117.0, 319.0, 304.0, 324.0, 319.0, 235.0, 118.0, 543.0, 210.0]
2025-09-12 07:46:43,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 59 minutes, 53 seconds)
2025-09-12 07:58:39,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:58:39,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:59:50,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 605.86066 ± 212.563
2025-09-12 07:59:50,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [540.80005, 315.21103, 224.23438, 625.79333, 824.9698, 949.98267, 480.5392, 775.2345, 670.8714, 650.97015]
2025-09-12 07:59:50,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 178.0, 112.0, 232.0, 310.0, 456.0, 185.0, 297.0, 268.0, 257.0]
2025-09-12 07:59:50,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 43 minutes, 7 seconds)
2025-09-12 08:11:47,931 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:11:47,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:12:51,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 535.18744 ± 218.058
2025-09-12 08:12:51,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [524.37476, 752.2326, 524.8908, 677.414, 336.8989, 801.18976, 247.99713, 132.66681, 596.2506, 757.9593]
2025-09-12 08:12:51,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 279.0, 189.0, 258.0, 143.0, 312.0, 137.0, 222.0, 240.0, 274.0]
2025-09-12 08:12:51,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 27 minutes, 18 seconds)
2025-09-12 08:24:29,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:24:29,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:25:38,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 624.10645 ± 133.642
2025-09-12 08:25:38,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [672.99475, 753.2879, 665.9643, 577.0118, 773.6361, 700.57007, 555.0494, 348.83136, 747.3904, 446.32822]
2025-09-12 08:25:38,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [250.0, 307.0, 260.0, 239.0, 269.0, 264.0, 238.0, 163.0, 290.0, 178.0]
2025-09-12 08:25:38,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 14 minutes, 37 seconds)
2025-09-12 08:37:20,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:37:20,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:38:38,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 664.64630 ± 120.042
2025-09-12 08:38:38,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [727.4265, 516.9651, 655.9978, 639.5512, 482.13068, 660.8297, 580.5066, 703.97473, 924.7341, 754.346]
2025-09-12 08:38:38,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 227.0, 269.0, 287.0, 223.0, 241.0, 247.0, 286.0, 418.0, 304.0]
2025-09-12 08:38:38,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 2 minutes, 6 seconds)
2025-09-12 08:50:36,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:50:36,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:51:48,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 663.58508 ± 138.015
2025-09-12 08:51:48,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [641.0007, 573.9712, 566.4064, 804.8684, 783.9501, 421.72424, 847.7747, 491.25146, 744.35754, 760.54626]
2025-09-12 08:51:48,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 205.0, 215.0, 315.0, 324.0, 179.0, 304.0, 194.0, 281.0, 266.0]
2025-09-12 08:51:48,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 50 minutes, 16 seconds)
2025-09-12 09:03:45,085 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:03:45,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:05:00,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 636.30615 ± 176.685
2025-09-12 09:05:00,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [555.5833, 410.42468, 583.436, 726.78674, 355.8042, 704.17584, 806.1658, 497.9397, 935.7424, 787.0028]
2025-09-12 09:05:00,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [272.0, 196.0, 236.0, 281.0, 143.0, 271.0, 310.0, 202.0, 460.0, 318.0]
2025-09-12 09:05:00,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 38 minutes, 33 seconds)
2025-09-12 09:16:49,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:16:49,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:17:44,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 449.15741 ± 131.342
2025-09-12 09:17:44,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [498.40256, 236.91345, 465.13388, 524.0991, 317.91367, 296.6559, 536.1508, 517.32874, 397.2799, 701.6963]
2025-09-12 09:17:44,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [246.0, 119.0, 208.0, 200.0, 138.0, 143.0, 233.0, 213.0, 187.0, 267.0]
2025-09-12 09:17:44,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 21 minutes, 25 seconds)
2025-09-12 09:29:29,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:29:29,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:30:32,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 576.62323 ± 270.759
2025-09-12 09:30:32,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [829.7397, 255.55359, 605.8329, 821.4615, 574.34283, 688.9759, 417.0316, 3.0775983, 625.1587, 945.05786]
2025-09-12 09:30:32,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 132.0, 255.0, 305.0, 242.0, 267.0, 160.0, 12.0, 231.0, 347.0]
2025-09-12 09:30:32,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 8 minutes, 38 seconds)
2025-09-12 09:42:23,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:42:23,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:43:38,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 705.52991 ± 83.246
2025-09-12 09:43:38,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [736.9755, 614.06464, 743.81104, 742.8323, 715.7683, 839.60846, 547.5302, 772.9894, 730.38727, 611.3319]
2025-09-12 09:43:38,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [292.0, 234.0, 273.0, 280.0, 268.0, 303.0, 244.0, 281.0, 292.0, 249.0]
2025-09-12 09:43:38,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 57 minutes, 5 seconds)
2025-09-12 09:55:41,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:55:41,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:56:54,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 665.90430 ± 166.587
2025-09-12 09:56:54,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [854.1902, 610.40674, 501.4176, 908.04364, 885.2965, 626.0851, 422.6334, 725.90796, 661.07367, 463.98846]
2025-09-12 09:56:54,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [341.0, 204.0, 210.0, 334.0, 353.0, 252.0, 167.0, 267.0, 257.0, 195.0]
2025-09-12 09:56:54,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 45 minutes, 22 seconds)
2025-09-12 10:08:36,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:08:36,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:09:48,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 652.34363 ± 239.626
2025-09-12 10:09:48,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [787.9745, 6.0777926, 800.0961, 740.6244, 577.0966, 752.52454, 951.55316, 620.11395, 611.1806, 676.19556]
2025-09-12 10:09:48,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 15.0, 319.0, 277.0, 265.0, 277.0, 344.0, 231.0, 242.0, 279.0]
2025-09-12 10:09:48,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 28 minutes, 7 seconds)
2025-09-12 10:21:55,883 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:21:55,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:23:17,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 740.62469 ± 188.431
2025-09-12 10:23:17,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [610.22, 478.65448, 434.80594, 732.29193, 762.88904, 1050.4441, 749.1368, 837.98444, 1004.7848, 745.0358]
2025-09-12 10:23:17,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 209.0, 176.0, 270.0, 345.0, 385.0, 284.0, 318.0, 375.0, 295.0]
2025-09-12 10:23:17,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 25 minutes, 8 seconds)
2025-09-12 10:35:02,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:35:02,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:36:16,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 720.78760 ± 295.142
2025-09-12 10:36:16,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1090.0874, 893.1728, 787.29095, 1043.8486, 0.58836836, 473.24622, 819.6611, 749.3225, 702.8095, 647.8482]
2025-09-12 10:36:16,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [392.0, 341.0, 257.0, 403.0, 10.0, 176.0, 293.0, 261.0, 249.0, 241.0]
2025-09-12 10:36:16,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 14 minutes, 27 seconds)
2025-09-12 10:48:07,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:48:07,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:49:28,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 827.75897 ± 324.923
2025-09-12 10:49:28,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [50.864605, 577.50446, 858.5144, 902.2538, 950.7269, 1024.6338, 786.5286, 1269.8876, 1164.742, 691.9337]
2025-09-12 10:49:28,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [63.0, 207.0, 277.0, 288.0, 324.0, 383.0, 273.0, 410.0, 437.0, 231.0]
2025-09-12 10:49:28,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (827.76) for latency MM1Queue_a033_s075
2025-09-12 10:49:28,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 2 minutes, 40 seconds)
2025-09-12 11:01:34,424 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:01:34,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:03:09,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 930.26837 ± 140.842
2025-09-12 11:03:09,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [757.90466, 713.53375, 935.3911, 979.7093, 1252.3969, 1004.1959, 843.743, 901.2916, 947.339, 967.17914]
2025-09-12 11:03:09,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 259.0, 338.0, 353.0, 466.0, 345.0, 288.0, 371.0, 326.0, 352.0]
2025-09-12 11:03:09,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (930.27) for latency MM1Queue_a033_s075
2025-09-12 11:03:09,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 54 minutes, 42 seconds)
2025-09-12 11:14:50,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:14:50,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:16:18,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 846.13055 ± 121.472
2025-09-12 11:16:18,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [900.1119, 801.12683, 1189.2354, 818.7279, 799.0628, 726.6016, 831.7405, 813.92773, 780.83716, 799.9338]
2025-09-12 11:16:18,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [377.0, 306.0, 403.0, 317.0, 291.0, 259.0, 305.0, 260.0, 296.0, 303.0]
2025-09-12 11:16:18,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 44 minutes, 46 seconds)
2025-09-12 11:28:16,375 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:28:16,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:29:38,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 862.36902 ± 174.476
2025-09-12 11:29:38,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [788.69745, 805.95807, 771.5177, 791.058, 873.1006, 932.03, 658.8962, 707.29596, 1304.6926, 990.44476]
2025-09-12 11:29:38,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 273.0, 263.0, 280.0, 292.0, 313.0, 243.0, 241.0, 430.0, 324.0]
2025-09-12 11:29:38,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 29 minutes, 33 seconds)
2025-09-12 11:41:32,788 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:41:32,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:43:20,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1140.81226 ± 353.308
2025-09-12 11:43:20,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [976.1106, 802.6062, 1388.0972, 1916.7169, 832.79016, 930.3311, 1104.1774, 1470.1085, 1268.6122, 718.5721]
2025-09-12 11:43:20,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [326.0, 263.0, 456.0, 647.0, 275.0, 316.0, 363.0, 483.0, 453.0, 248.0]
2025-09-12 11:43:20,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1140.81) for latency MM1Queue_a033_s075
2025-09-12 11:43:20,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 24 minutes, 48 seconds)
2025-09-12 11:55:05,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:55:05,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:56:47,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1085.74097 ± 279.968
2025-09-12 11:56:47,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1469.4642, 735.1733, 997.8634, 751.9557, 1034.7257, 1354.8174, 839.11597, 1121.4269, 972.53046, 1580.3357]
2025-09-12 11:56:47,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [454.0, 265.0, 364.0, 260.0, 357.0, 440.0, 298.0, 333.0, 356.0, 494.0]
2025-09-12 11:56:47,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 14 minutes, 13 seconds)
2025-09-12 12:08:28,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:08:28,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:09:54,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 873.61505 ± 248.356
2025-09-12 12:09:54,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [782.4428, 1111.1449, 859.0586, 730.5645, 872.368, 860.1974, 1180.1979, 515.9363, 1303.4814, 520.7589]
2025-09-12 12:09:54,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [294.0, 369.0, 314.0, 263.0, 298.0, 298.0, 362.0, 204.0, 472.0, 215.0]
2025-09-12 12:09:54,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 54 minutes, 24 seconds)
2025-09-12 12:22:06,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:22:06,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:23:52,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1146.63721 ± 548.378
2025-09-12 12:23:52,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1020.82465, 0.56016415, 1244.5576, 1176.9427, 1489.7339, 850.5438, 778.76624, 1537.6204, 2230.0017, 1136.8206]
2025-09-12 12:23:52,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [333.0, 10.0, 413.0, 372.0, 509.0, 306.0, 257.0, 491.0, 657.0, 374.0]
2025-09-12 12:23:52,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1146.64) for latency MM1Queue_a033_s075
2025-09-12 12:23:52,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 50 minutes, 7 seconds)
2025-09-12 12:35:52,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:35:52,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:38:35,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1788.78577 ± 1032.678
2025-09-12 12:38:35,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3094.2954, 1162.7496, 2568.5413, 1069.204, -0.55595225, 1241.2041, 981.9155, 3071.761, 3061.754, 1636.9884]
2025-09-12 12:38:35,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 386.0, 875.0, 344.0, 9.0, 376.0, 320.0, 1000.0, 972.0, 485.0]
2025-09-12 12:38:35,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1788.79) for latency MM1Queue_a033_s075
2025-09-12 12:38:35,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 52 minutes, 12 seconds)
2025-09-12 12:50:16,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:50:16,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:51:44,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 996.13721 ± 497.460
2025-09-12 12:51:44,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [997.882, 1110.6506, 1867.794, 388.9857, 866.0501, 840.8587, -0.13163011, 1208.7006, 1246.7991, 1433.7832]
2025-09-12 12:51:44,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [305.0, 326.0, 585.0, 150.0, 275.0, 263.0, 9.0, 378.0, 386.0, 467.0]
2025-09-12 12:51:44,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 32 minutes, 23 seconds)
2025-09-12 13:03:31,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:03:31,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:06:01,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1628.79126 ± 655.567
2025-09-12 13:06:01,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1400.971, 1424.7748, 839.0368, 1938.107, 1029.898, 3022.9348, 1312.2604, 2281.137, 956.9098, 2081.8828]
2025-09-12 13:06:01,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [435.0, 486.0, 321.0, 615.0, 348.0, 1000.0, 406.0, 699.0, 306.0, 693.0]
2025-09-12 13:06:01,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 27 minutes, 51 seconds)
2025-09-12 13:18:07,420 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:18:07,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:20:21,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1553.07031 ± 815.063
2025-09-12 13:20:21,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [997.72375, 1221.2971, 3351.0742, 816.58435, 859.7829, 600.2693, 1704.2654, 2459.8037, 1504.5397, 2015.3622]
2025-09-12 13:20:21,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [315.0, 371.0, 1000.0, 259.0, 259.0, 213.0, 456.0, 842.0, 462.0, 603.0]
2025-09-12 13:20:21,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 26 minutes, 48 seconds)
2025-09-12 13:32:17,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:32:17,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:35:42,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2340.45654 ± 953.497
2025-09-12 13:35:42,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1587.716, 3317.7349, 2553.9807, 3309.5703, 1302.4701, 971.1973, 3093.969, 2998.1921, 3263.5586, 1006.17755]
2025-09-12 13:35:42,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [482.0, 1000.0, 773.0, 1000.0, 446.0, 324.0, 1000.0, 1000.0, 1000.0, 313.0]
2025-09-12 13:35:42,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2340.46) for latency MM1Queue_a033_s075
2025-09-12 13:35:42,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 27 minutes, 6 seconds)
2025-09-12 13:47:47,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:47:47,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:49:47,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1413.76794 ± 760.909
2025-09-12 13:49:47,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [567.5727, 1480.9808, 1198.2944, 716.38477, 2103.5256, 1315.5344, 1079.3673, 3363.4329, 1208.5605, 1104.0264]
2025-09-12 13:49:47,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [200.0, 440.0, 360.0, 256.0, 623.0, 398.0, 342.0, 943.0, 384.0, 330.0]
2025-09-12 13:49:47,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 6 minutes, 19 seconds)
2025-09-12 14:01:52,366 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:01:52,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:04:29,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1818.11389 ± 919.024
2025-09-12 14:04:29,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1496.7325, 1098.601, 1853.8965, 985.2719, 750.4667, 2767.6924, 3309.648, 1784.1604, 897.1744, 3237.4946]
2025-09-12 14:04:29,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [478.0, 337.0, 581.0, 321.0, 297.0, 802.0, 1000.0, 551.0, 327.0, 943.0]
2025-09-12 14:04:29,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 7 minutes, 30 seconds)
2025-09-12 14:15:43,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:15:43,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:19:08,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2408.94287 ± 975.556
2025-09-12 14:19:08,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1137.4932, 895.4659, 1632.7195, 3497.332, 3221.944, 3502.7307, 3172.9746, 2911.8892, 2722.9082, 1393.9692]
2025-09-12 14:19:08,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [367.0, 319.0, 488.0, 990.0, 1000.0, 995.0, 1000.0, 840.0, 821.0, 436.0]
2025-09-12 14:19:08,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2408.94) for latency MM1Queue_a033_s075
2025-09-12 14:19:08,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 56 minutes, 27 seconds)
2025-09-12 14:31:36,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:31:36,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:35:20,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2559.98975 ± 881.828
2025-09-12 14:35:20,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [634.821, 2282.6067, 3213.4717, 2765.658, 3273.7246, 3268.1182, 1279.0541, 3192.7673, 2451.5793, 3238.0942]
2025-09-12 14:35:20,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 710.0, 1000.0, 837.0, 1000.0, 1000.0, 408.0, 1000.0, 755.0, 1000.0]
2025-09-12 14:35:20,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2559.99) for latency MM1Queue_a033_s075
2025-09-12 14:35:20,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 59 minutes, 46 seconds)
2025-09-12 14:46:45,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:46:45,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:49:09,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1649.80103 ± 792.732
2025-09-12 14:49:09,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2563.4893, 1642.9368, -0.36522606, 3052.3464, 2170.5588, 1326.5363, 1710.1683, 1166.5157, 1594.4823, 1271.3406]
2025-09-12 14:49:09,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [778.0, 515.0, 9.0, 1000.0, 642.0, 414.0, 527.0, 363.0, 493.0, 417.0]
2025-09-12 14:49:09,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 30 minutes, 27 seconds)
2025-09-12 15:01:21,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:01:21,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:05:33,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2845.80225 ± 716.438
2025-09-12 15:05:33,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3249.3008, 3121.7783, 857.2625, 2486.2383, 3350.873, 3156.008, 2620.0098, 3099.19, 3221.0825, 3296.2808]
2025-09-12 15:05:33,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 295.0, 763.0, 1000.0, 1000.0, 837.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:05:33,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2845.80) for latency MM1Queue_a033_s075
2025-09-12 15:05:33,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 37 minutes, 3 seconds)
2025-09-12 15:17:00,243 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:17:00,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:21:10,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2941.88232 ± 754.623
2025-09-12 15:21:10,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3363.133, 3410.9685, 3235.9597, 3348.3223, 990.2312, 2109.3984, 2872.0999, 3402.796, 3296.1543, 3389.7578]
2025-09-12 15:21:10,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 328.0, 620.0, 858.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:21:10,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2941.88) for latency MM1Queue_a033_s075
2025-09-12 15:21:10,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 30 minutes, 6 seconds)
2025-09-12 15:33:15,105 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:33:15,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:36:51,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2521.59424 ± 988.087
2025-09-12 15:36:51,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1467.9359, 833.5536, 1244.1793, 3339.8433, 1840.4537, 3235.6921, 3212.186, 3342.2446, 3402.914, 3296.9392]
2025-09-12 15:36:51,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [460.0, 309.0, 406.0, 1000.0, 572.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:36:51,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 23 minutes, 58 seconds)
2025-09-12 15:48:47,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:48:47,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:52:05,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2360.90967 ± 832.889
2025-09-12 15:52:05,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1836.6393, 3373.466, 2946.0405, 2994.1138, 2046.3259, 3361.6187, 972.49475, 1315.4824, 3002.6443, 1760.2701]
2025-09-12 15:52:05,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [517.0, 1000.0, 851.0, 889.0, 613.0, 1000.0, 307.0, 454.0, 883.0, 561.0]
2025-09-12 15:52:05,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 59 minutes, 59 seconds)
2025-09-12 16:03:55,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:03:55,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:06:29,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1806.80798 ± 986.090
2025-09-12 16:06:29,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2140.9832, 2084.2468, 894.64996, 2487.621, 3332.8096, 3206.837, 1638.5974, 151.06842, 862.7179, 1268.549]
2025-09-12 16:06:29,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [659.0, 596.0, 296.0, 731.0, 1000.0, 918.0, 499.0, 85.0, 289.0, 387.0]
2025-09-12 16:06:29,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 49 minutes, 36 seconds)
2025-09-12 16:18:47,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:18:47,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:22:38,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2687.14307 ± 1292.491
2025-09-12 16:22:38,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3393.6504, 3279.4094, 110.23839, 3339.5417, 101.06655, 3259.0688, 3434.1985, 3290.1643, 3434.3408, 3229.749]
2025-09-12 16:22:38,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 63.0, 1000.0, 56.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:22:38,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 32 minutes, 1 second)
2025-09-12 16:34:19,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:19,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:38:59,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3310.10474 ± 125.309
2025-09-12 16:38:59,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3435.4788, 3339.5342, 3188.8953, 3445.4187, 3397.4583, 3019.4158, 3216.5986, 3339.0967, 3339.2246, 3379.9275]
2025-09-12 16:38:59,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 875.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:38:59,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3310.10) for latency MM1Queue_a033_s075
2025-09-12 16:38:59,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 22 minutes, 33 seconds)
2025-09-12 16:51:41,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:51:41,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:56:08,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3203.49585 ± 598.932
2025-09-12 16:56:08,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3427.4077, 1408.7374, 3343.1907, 3385.6797, 3374.7007, 3426.4998, 3402.9795, 3402.173, 3449.8057, 3413.7866]
2025-09-12 16:56:08,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 448.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:56:08,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 18 minutes, 20 seconds)
2025-09-12 17:07:36,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:07:36,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:12:08,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3333.20312 ± 315.281
2025-09-12 17:12:08,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3518.37, 2997.593, 3450.456, 2547.1348, 3526.2495, 3408.2256, 3266.2769, 3538.0337, 3682.5225, 3397.167]
2025-09-12 17:12:08,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 890.0, 1000.0, 783.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:12:08,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3333.20) for latency MM1Queue_a033_s075
2025-09-12 17:12:08,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 8 minutes, 29 seconds)
2025-09-12 17:24:10,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:24:10,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:27:46,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2586.82520 ± 1078.138
2025-09-12 17:27:46,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1807.834, 264.97546, 2821.8105, 3364.3232, 1169.0388, 2675.5417, 3540.283, 3509.278, 3366.317, 3348.849]
2025-09-12 17:27:46,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [553.0, 123.0, 816.0, 1000.0, 351.0, 800.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:27:46,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 1 minute, 30 seconds)
2025-09-12 17:39:04,671 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:39:04,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:43:14,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3087.26318 ± 1023.824
2025-09-12 17:43:14,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3555.1304, 3415.6418, 3420.2559, 3515.7964, 3192.0278, 3483.8772, 3497.819, 3358.8142, 29.465788, 3403.804]
2025-09-12 17:43:14,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 932.0, 1000.0, 1000.0, 1000.0, 48.0, 1000.0]
2025-09-12 17:43:14,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 40 minutes, 20 seconds)
2025-09-12 17:55:57,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:55:57,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:00:28,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3324.57104 ± 255.695
2025-09-12 18:00:28,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3351.2502, 3396.609, 3266.0964, 3464.253, 3473.697, 2577.757, 3437.1426, 3447.5396, 3398.3413, 3433.0242]
2025-09-12 18:00:28,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 744.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:00:28,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 30 minutes, 19 seconds)
2025-09-12 18:11:23,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:11:23,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:15:42,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3096.28564 ± 820.349
2025-09-12 18:15:42,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3346.3904, 3343.08, 3385.493, 3402.0269, 3434.2478, 637.13635, 3337.661, 3320.0083, 3375.699, 3381.116]
2025-09-12 18:15:42,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 242.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:15:42,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 1 minute, 6 seconds)
2025-09-12 18:27:39,177 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:27:39,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:32:06,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3322.75830 ± 380.476
2025-09-12 18:32:06,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3228.849, 3431.2048, 3536.04, 3430.8804, 3336.437, 3509.9338, 3570.3992, 3490.3152, 3476.0825, 2217.4404]
2025-09-12 18:32:06,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [917.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 662.0]
2025-09-12 18:32:06,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 47 minutes, 45 seconds)
2025-09-12 18:44:02,810 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:44:02,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:47:56,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2840.85986 ± 1090.938
2025-09-12 18:47:56,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3392.4045, 3455.7695, 3269.5186, 3477.8298, 2901.7869, 3341.076, 3473.1487, 3482.4998, 1611.4519, 3.1135345]
2025-09-12 18:47:56,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 952.0, 1000.0, 823.0, 1000.0, 1000.0, 1000.0, 508.0, 12.0]
2025-09-12 18:47:56,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 32 minutes, 59 seconds)
2025-09-12 18:59:50,501 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:59:50,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:04:04,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3144.41064 ± 653.713
2025-09-12 19:04:04,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3529.3916, 3539.1367, 3516.469, 3483.3184, 1754.3529, 3152.1772, 3538.9492, 3425.7087, 1964.4269, 3540.1763]
2025-09-12 19:04:04,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 527.0, 886.0, 1000.0, 1000.0, 575.0, 1000.0]
2025-09-12 19:04:04,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 21 minutes, 7 seconds)
2025-09-12 19:15:53,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:15:53,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:20:07,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3110.19824 ± 1037.932
2025-09-12 19:20:07,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3431.5642, 3490.1584, 3423.5298, 3536.1304, 0.8313558, 3507.437, 3332.1716, 3414.0586, 3492.4673, 3473.6335]
2025-09-12 19:20:07,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 10.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:20:07,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 57 minutes, 54 seconds)
2025-09-12 19:31:59,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:31:59,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:36:41,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3498.20557 ± 56.402
2025-09-12 19:36:41,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3559.8672, 3448.2979, 3401.5986, 3492.4338, 3564.971, 3457.605, 3532.5894, 3580.9404, 3454.0918, 3489.66]
2025-09-12 19:36:41,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:36:41,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3498.21) for latency MM1Queue_a033_s075
2025-09-12 19:36:41,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 49 minutes, 41 seconds)
2025-09-12 19:48:34,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:48:34,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:52:47,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3104.81592 ± 972.095
2025-09-12 19:52:47,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3440.6677, 3422.8643, 3392.5474, 3474.3718, 3411.6113, 3454.1663, 189.35512, 3416.419, 3405.638, 3440.5183]
2025-09-12 19:52:47,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 92.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:52:47,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 31 minutes, 49 seconds)
2025-09-12 20:04:39,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:04:39,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:09:00,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3222.64307 ± 576.646
2025-09-12 20:09:00,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1529.0631, 3079.5073, 3521.8186, 3430.7493, 3407.9429, 3387.169, 3505.0513, 3449.883, 3472.8118, 3442.432]
2025-09-12 20:09:00,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [488.0, 904.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 986.0]
2025-09-12 20:09:00,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 17 minutes, 48 seconds)
2025-09-12 20:20:53,744 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:20:53,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:25:30,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3519.51758 ± 32.723
2025-09-12 20:25:30,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3548.387, 3518.323, 3517.2915, 3570.723, 3469.6025, 3543.7742, 3529.556, 3517.8633, 3455.919, 3523.7373]
2025-09-12 20:25:30,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:25:30,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3519.52) for latency MM1Queue_a033_s075
2025-09-12 20:25:30,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 3 minutes, 28 seconds)
2025-09-12 20:37:29,820 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:37:29,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:42:08,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3476.37109 ± 39.748
2025-09-12 20:42:08,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3492.316, 3438.1274, 3528.1553, 3446.5369, 3536.199, 3474.318, 3430.1992, 3421.4927, 3518.5781, 3477.7888]
2025-09-12 20:42:08,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:42:08,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 50 minutes, 6 seconds)
2025-09-12 20:54:00,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:54:00,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:58:09,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3131.84521 ± 982.498
2025-09-12 20:58:09,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [237.35912, 3608.2034, 3398.2932, 3555.2195, 3572.7334, 3476.1582, 3549.1677, 3513.3538, 3482.1687, 2925.7966]
2025-09-12 20:58:09,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [109.0, 1000.0, 942.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 821.0]
2025-09-12 20:58:09,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 31 minutes, 2 seconds)
2025-09-12 21:09:54,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:09:54,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:14:07,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3191.72998 ± 1064.544
2025-09-12 21:14:07,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3531.5364, 3557.6707, 3513.149, 3503.5164, -0.83814937, 3569.3296, 3569.537, 3596.5657, 3555.1074, 3521.7275]
2025-09-12 21:14:07,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 11.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:14:07,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 14 minutes, 8 seconds)
2025-09-12 21:25:59,210 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:25:59,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:30:41,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3449.06836 ± 38.156
2025-09-12 21:30:41,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3477.5261, 3484.1628, 3391.5156, 3499.7363, 3436.2048, 3455.2944, 3417.3416, 3500.9392, 3402.273, 3425.6895]
2025-09-12 21:30:41,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:30:41,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 59 minutes, 21 seconds)
2025-09-12 21:42:32,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:42:32,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:47:14,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3438.14136 ± 46.313
2025-09-12 21:47:14,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3433.0789, 3467.1802, 3426.9922, 3369.9265, 3392.5781, 3436.8806, 3538.3555, 3419.9785, 3487.901, 3408.5437]
2025-09-12 21:47:14,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:47:14,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 43 minutes, 18 seconds)
2025-09-12 21:58:29,363 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:58:29,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:02:29,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2964.39038 ± 1102.268
2025-09-12 22:02:29,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3488.853, 3520.4255, 3527.1804, 3457.9321, 3409.991, 3472.1487, 1836.6063, 2.0510724, 3493.7148, 3435.0012]
2025-09-12 22:02:29,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 538.0, 12.0, 1000.0, 1000.0]
2025-09-12 22:02:29,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 21 minutes, 24 seconds)
2025-09-12 22:14:30,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:14:30,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:19:11,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3548.49414 ± 22.827
2025-09-12 22:19:11,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3546.24, 3511.2048, 3542.562, 3530.3489, 3579.3718, 3579.8384, 3521.9807, 3541.9856, 3575.1091, 3556.301]
2025-09-12 22:19:11,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:19:11,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3548.49) for latency MM1Queue_a033_s075
2025-09-12 22:19:11,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 7 minutes, 54 seconds)
2025-09-12 22:31:04,386 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:31:04,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:35:45,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3524.00439 ± 30.958
2025-09-12 22:35:45,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3534.9873, 3536.597, 3561.3726, 3494.6602, 3546.129, 3481.4272, 3482.4597, 3548.5107, 3561.5986, 3492.3]
2025-09-12 22:35:45,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:35:45,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 53 minutes, 51 seconds)
2025-09-12 22:48:22,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:48:22,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:53:00,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3483.57959 ± 147.763
2025-09-12 22:53:00,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3578.5264, 3069.3079, 3557.2483, 3529.5732, 3535.5237, 3591.2761, 3434.3828, 3434.4084, 3579.71, 3525.84]
2025-09-12 22:53:00,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 871.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:53:00,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 39 minutes, 55 seconds)
2025-09-12 23:04:51,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:04:51,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:08:35,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2854.69824 ± 1427.173
2025-09-12 23:08:35,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3591.3718, 3599.6982, 1.9488386, 0.5353054, 3559.8452, 3592.1528, 3568.4438, 3616.664, 3530.2456, 3486.0762]
2025-09-12 23:08:35,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 12.0, 10.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:08:35,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 20 minutes, 18 seconds)
2025-09-12 23:20:27,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:20:27,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:24:33,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3078.64185 ± 998.394
2025-09-12 23:24:33,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3546.3142, 3521.7542, 3601.5515, 3526.5903, 3573.8162, 493.0738, 1865.533, 3536.2542, 3595.851, 3525.6794]
2025-09-12 23:24:33,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 182.0, 539.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:24:33,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 6 minutes, 11 seconds)
2025-09-12 23:36:25,564 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:36:25,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:41:03,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3522.55396 ± 23.949
2025-09-12 23:41:03,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3483.9944, 3497.7297, 3541.6956, 3540.8848, 3550.6355, 3515.6113, 3520.575, 3512.7632, 3561.2927, 3500.3552]
2025-09-12 23:41:03,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:41:03,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 49 minutes, 15 seconds)
2025-09-12 23:52:38,880 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:52:38,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:56:52,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3224.22705 ± 1072.706
2025-09-12 23:56:52,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3570.0457, 3627.2097, 3588.8225, 3613.5984, 3521.4985, 3636.538, 8.096616, 3548.5625, 3528.1477, 3599.7522]
2025-09-12 23:56:52,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 16.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:56:52,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 30 minutes, 55 seconds)
2025-09-13 00:08:48,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:08:48,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:13:02,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3223.20508 ± 1075.556
2025-09-13 00:13:02,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3.663108, 3644.1418, 3591.5186, 3613.8994, 3654.5957, 3639.947, 3561.2822, 3575.3398, 3391.469, 3556.1948]
2025-09-13 00:13:02,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:13:02,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 12 minutes, 3 seconds)
2025-09-13 00:24:54,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:24:54,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:29:05,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3190.83545 ± 1064.609
2025-09-13 00:29:05,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3544.4878, 3540.808, 3564.0696, 3506.0742, 3524.0383, 3532.2957, 3591.8555, 3610.818, -1.3717158, 3495.2795]
2025-09-13 00:29:05,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 9.0, 1000.0]
2025-09-13 00:29:05,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 57 minutes, 6 seconds)
2025-09-13 00:41:00,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:41:00,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:45:41,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3583.03076 ± 46.740
2025-09-13 00:45:41,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3557.765, 3609.3892, 3616.39, 3511.7922, 3681.5793, 3607.3088, 3576.0476, 3585.0486, 3523.9883, 3560.9995]
2025-09-13 00:45:41,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:45:41,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3583.03) for latency MM1Queue_a033_s075
2025-09-13 00:45:41,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 42 minutes, 15 seconds)
2025-09-13 00:57:34,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:57:34,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:02:13,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3516.29053 ± 25.269
2025-09-13 01:02:13,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3553.603, 3496.4348, 3527.7664, 3473.0408, 3525.5554, 3486.8228, 3544.7786, 3529.9548, 3494.9429, 3530.0054]
2025-09-13 01:02:13,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:02:13,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 26 minutes, 5 seconds)
2025-09-13 01:13:43,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:13:43,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:17:58,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3215.72266 ± 1026.110
2025-09-13 01:17:58,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3517.6206, 3588.5461, 3549.9343, 3586.9685, 3538.9526, 3570.3157, 138.4456, 3531.875, 3601.7415, 3532.8252]
2025-09-13 01:17:58,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 90.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:17:58,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 9 minutes, 44 seconds)
2025-09-13 01:29:56,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:29:56,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:34:10,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3261.65503 ± 1087.327
2025-09-13 01:34:10,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2.1635616, 3713.0596, 3652.1318, 3629.364, 3637.3293, 3623.635, 3567.6777, 3617.7046, 3547.2896, 3626.1946]
2025-09-13 01:34:10,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:34:10,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 53 minutes, 35 seconds)
2025-09-13 01:46:02,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:46:02,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:49:48,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2829.10498 ± 1413.588
2025-09-13 01:49:48,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3574.0566, 3557.5898, 3562.5237, 3515.1816, 1.6544771, 3475.5269, 3551.5315, 3546.3784, 3.35379, 3503.2532]
2025-09-13 01:49:48,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 11.0, 1000.0, 1000.0, 1000.0, 13.0, 1000.0]
2025-09-13 01:49:48,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 36 minutes, 51 seconds)
2025-09-13 02:01:47,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:01:47,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:06:29,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3594.44995 ± 28.152
2025-09-13 02:06:29,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3591.5654, 3594.0056, 3586.4165, 3584.2163, 3641.8994, 3629.2595, 3584.8894, 3622.1484, 3567.7102, 3542.3848]
2025-09-13 02:06:29,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:06:29,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3594.45) for latency MM1Queue_a033_s075
2025-09-13 02:06:29,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 20 minutes, 47 seconds)
2025-09-13 02:18:22,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:18:22,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:23:03,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3619.34180 ± 62.058
2025-09-13 02:23:03,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3496.3596, 3716.8005, 3596.6272, 3575.7668, 3611.303, 3712.5627, 3592.2573, 3651.3057, 3598.2712, 3642.1604]
2025-09-13 02:23:03,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:23:03,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3619.34) for latency MM1Queue_a033_s075
2025-09-13 02:23:03,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 4 minutes, 40 seconds)
2025-09-13 02:34:57,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:34:57,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:39:11,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3245.70972 ± 1075.442
2025-09-13 02:39:11,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3666.5466, 3600.733, 3547.8955, 21.266994, 3629.886, 3534.413, 3632.8608, 3611.3203, 3608.6462, 3603.5286]
2025-09-13 02:39:11,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 31.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:39:11,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 48 minutes, 43 seconds)
2025-09-13 02:51:03,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:51:03,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:55:43,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3471.74951 ± 26.197
2025-09-13 02:55:43,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3450.345, 3485.9014, 3434.4036, 3442.3594, 3508.4346, 3451.48, 3479.7283, 3479.907, 3467.9707, 3516.9685]
2025-09-13 02:55:43,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:55:43,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 32 minutes, 37 seconds)
2025-09-13 03:07:36,093 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:07:36,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:12:16,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3607.61279 ± 101.029
2025-09-13 03:12:16,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3634.2854, 3307.8872, 3631.9043, 3625.8628, 3622.9, 3655.6536, 3662.0435, 3643.8916, 3666.5066, 3625.1943]
2025-09-13 03:12:16,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:12:16,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 29 seconds)
2025-09-13 03:24:11,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:24:11,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:28:53,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3653.42114 ± 35.721
2025-09-13 03:28:53,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3636.719, 3605.4844, 3697.587, 3694.04, 3694.3557, 3606.1396, 3691.2744, 3650.2844, 3634.2476, 3624.0815]
2025-09-13 03:28:53,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:28:53,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3653.42) for latency MM1Queue_a033_s075
2025-09-13 03:28:53,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1251 [DEBUG]: Training session finished
