2025-09-12 02:54:56,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 02:54:56,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 02:54:56,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x145e4fbf2b90>}
2025-09-12 02:54:56,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1111 [DEBUG]: using device: cuda
2025-09-12 02:54:56,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1133 [INFO]: Creating new trainer
2025-09-12 02:54:56,131 baseline-mbpac-noiseperc5-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 02:54:56,131 baseline-mbpac-noiseperc5-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 02:54:56,139 baseline-mbpac-noiseperc5-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 02:54:57,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1194 [DEBUG]: Starting training session...
2025-09-12 02:54:57,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 1/100
2025-09-12 03:05:19,093 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:05:19,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:06:08,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 138.18903 ± 172.329
2025-09-12 03:06:08,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [-30.28158, 253.70251, 256.84592, 469.55524, -20.610237, 11.689891, 238.45439, -31.089424, 275.7514, -42.127777]
2025-09-12 03:06:08,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [111.0, 151.0, 240.0, 458.0, 81.0, 106.0, 130.0, 116.0, 238.0, 140.0]
2025-09-12 03:06:08,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (138.19) for latency MM1Queue_a033_s075
2025-09-12 03:06:08,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 27 minutes, 17 seconds)
2025-09-12 03:17:58,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:17:58,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:18:57,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 180.44519 ± 122.156
2025-09-12 03:18:57,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [44.04114, 188.70148, 196.22484, 264.17813, 426.44638, 220.99342, 39.45551, 19.303774, 287.12344, 117.98378]
2025-09-12 03:18:57,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 170.0, 320.0, 157.0, 318.0, 161.0, 146.0, 144.0, 180.0, 349.0]
2025-09-12 03:18:57,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (180.45) for latency MM1Queue_a033_s075
2025-09-12 03:18:57,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 36 minutes, 18 seconds)
2025-09-12 03:30:51,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:30:51,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:31:49,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 116.30029 ± 128.782
2025-09-12 03:31:49,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [24.456, 98.73303, 203.16017, 17.425152, 31.021322, 16.30138, 447.6075, 199.51825, 60.349514, 64.43062]
2025-09-12 03:31:49,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [66.0, 269.0, 153.0, 197.0, 119.0, 167.0, 700.0, 144.0, 157.0, 105.0]
2025-09-12 03:31:49,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 52 minutes, 6 seconds)
2025-09-12 03:43:43,954 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:43:43,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:44:31,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 278.18222 ± 97.140
2025-09-12 03:44:31,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [365.70627, 263.49298, 345.972, 230.71231, 20.381037, 292.88217, 319.9591, 355.35962, 244.8038, 342.5533]
2025-09-12 03:44:31,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 175.0, 185.0, 124.0, 29.0, 170.0, 204.0, 228.0, 121.0, 277.0]
2025-09-12 03:44:31,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (278.18) for latency MM1Queue_a033_s075
2025-09-12 03:44:32,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 49 minutes, 49 seconds)
2025-09-12 03:56:08,627 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:56:08,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:57:01,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 236.20613 ± 127.989
2025-09-12 03:57:01,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [344.05804, 277.521, 374.8011, 160.05847, 17.507652, 143.43565, 319.81177, 420.96567, 234.73216, 69.169754]
2025-09-12 03:57:01,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 146.0, 328.0, 187.0, 27.0, 101.0, 216.0, 315.0, 155.0, 189.0]
2025-09-12 03:57:01,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 39 minutes, 13 seconds)
2025-09-12 04:09:03,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:09:03,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:10:01,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 267.59686 ± 78.691
2025-09-12 04:10:01,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [235.15996, 318.9188, 282.16226, 198.46432, 420.195, 286.74933, 197.52371, 268.33847, 128.26952, 340.18762]
2025-09-12 04:10:01,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 163.0, 230.0, 149.0, 350.0, 198.0, 222.0, 193.0, 221.0, 197.0]
2025-09-12 04:10:01,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 53 seconds)
2025-09-12 04:21:38,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:21:38,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:22:13,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 188.50623 ± 70.964
2025-09-12 04:22:13,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [176.71709, 234.33754, 266.4208, 4.315665, 253.85913, 181.13411, 180.17064, 167.37074, 176.27206, 244.46451]
2025-09-12 04:22:13,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [110.0, 147.0, 163.0, 13.0, 144.0, 117.0, 163.0, 111.0, 111.0, 143.0]
2025-09-12 04:22:13,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 36 minutes, 34 seconds)
2025-09-12 04:34:18,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:34:18,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:35:12,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 233.58931 ± 100.518
2025-09-12 04:35:12,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [243.82106, 176.86996, 324.25076, 205.20364, 221.65422, 55.00519, 257.97968, 463.57025, 193.26164, 194.27667]
2025-09-12 04:35:12,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 148.0, 327.0, 237.0, 134.0, 72.0, 145.0, 357.0, 185.0, 150.0]
2025-09-12 04:35:12,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 26 minutes, 7 seconds)
2025-09-12 04:46:47,553 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:46:47,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:47:33,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 261.35767 ± 96.550
2025-09-12 04:47:33,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [270.66714, 64.54451, 323.21906, 268.02533, 105.47715, 330.07004, 383.3118, 330.28076, 239.28036, 298.70056]
2025-09-12 04:47:33,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [153.0, 144.0, 176.0, 147.0, 143.0, 176.0, 203.0, 196.0, 139.0, 183.0]
2025-09-12 04:47:33,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 7 minutes, 1 second)
2025-09-12 04:59:33,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:59:33,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:00:08,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 226.51714 ± 129.825
2025-09-12 05:00:08,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [365.47003, 33.070244, 297.19272, 169.65315, 190.43404, 8.89585, 182.37683, 265.2295, 433.58035, 319.26877]
2025-09-12 05:00:08,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 42.0, 167.0, 92.0, 100.0, 22.0, 108.0, 111.0, 283.0, 162.0]
2025-09-12 05:00:08,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 56 minutes, 16 seconds)
2025-09-12 05:11:56,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:11:56,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:12:36,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 249.23270 ± 127.052
2025-09-12 05:12:36,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [140.86032, 421.32782, 154.92717, 233.88324, 174.7629, 165.56564, 501.49506, 383.46887, 164.7262, 151.30986]
2025-09-12 05:12:36,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [78.0, 205.0, 78.0, 241.0, 97.0, 150.0, 224.0, 187.0, 86.0, 92.0]
2025-09-12 05:12:36,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 33 minutes, 57 seconds)
2025-09-12 05:24:21,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:24:21,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:25:16,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 288.92624 ± 160.243
2025-09-12 05:25:16,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [303.35318, 198.85031, 61.171665, 436.24347, 43.378216, 297.38876, 466.01373, 511.6793, 156.14333, 415.04053]
2025-09-12 05:25:16,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 99.0, 179.0, 323.0, 54.0, 210.0, 201.0, 356.0, 96.0, 253.0]
2025-09-12 05:25:16,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (288.93) for latency MM1Queue_a033_s075
2025-09-12 05:25:16,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 29 minutes, 43 seconds)
2025-09-12 05:37:07,582 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:37:07,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:38:03,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 348.29535 ± 139.759
2025-09-12 05:38:03,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [395.51993, 386.6129, 139.85165, 425.63864, 224.48286, 325.6526, 354.23007, 686.024, 291.95477, 252.9862]
2025-09-12 05:38:03,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 196.0, 99.0, 277.0, 171.0, 171.0, 159.0, 491.0, 125.0, 133.0]
2025-09-12 05:38:03,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (348.30) for latency MM1Queue_a033_s075
2025-09-12 05:38:03,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 13 minutes, 41 seconds)
2025-09-12 05:50:03,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:50:03,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:50:51,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 356.54211 ± 183.333
2025-09-12 05:50:51,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [317.84418, 237.68327, 793.266, 373.75745, 303.77487, 279.22336, 594.2304, 160.69955, 206.71672, 298.2254]
2025-09-12 05:50:51,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 111.0, 387.0, 144.0, 121.0, 122.0, 305.0, 96.0, 97.0, 146.0]
2025-09-12 05:50:51,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (356.54) for latency MM1Queue_a033_s075
2025-09-12 05:50:51,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 8 minutes, 46 seconds)
2025-09-12 06:02:35,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:02:35,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:03:21,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 377.67355 ± 188.953
2025-09-12 06:03:21,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [575.5891, 155.45827, 228.33296, 477.09583, 510.57086, 230.0013, 188.33913, 770.35095, 331.54712, 309.4502]
2025-09-12 06:03:21,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 93.0, 111.0, 187.0, 215.0, 109.0, 96.0, 303.0, 157.0, 133.0]
2025-09-12 06:03:21,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (377.67) for latency MM1Queue_a033_s075
2025-09-12 06:03:21,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 54 minutes, 39 seconds)
2025-09-12 06:15:19,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:15:19,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:16:16,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 441.13248 ± 142.546
2025-09-12 06:16:16,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [339.26807, 166.40228, 399.49957, 352.6642, 451.50168, 433.2906, 610.54974, 534.34674, 418.63376, 705.1679]
2025-09-12 06:16:16,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 90.0, 170.0, 183.0, 197.0, 188.0, 329.0, 234.0, 218.0, 268.0]
2025-09-12 06:16:16,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (441.13) for latency MM1Queue_a033_s075
2025-09-12 06:16:16,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 49 minutes, 47 seconds)
2025-09-12 06:28:04,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:28:04,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:29:11,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 436.71323 ± 141.630
2025-09-12 06:29:11,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [385.6474, 381.96927, 686.8115, 341.1867, 138.97285, 424.61456, 470.22037, 421.01688, 516.17694, 600.5158]
2025-09-12 06:29:11,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 143.0, 356.0, 197.0, 116.0, 250.0, 233.0, 395.0, 295.0, 258.0]
2025-09-12 06:29:11,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 40 minutes, 57 seconds)
2025-09-12 06:41:06,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:41:06,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:41:50,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 370.91342 ± 121.283
2025-09-12 06:41:50,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [483.36108, 307.68884, 186.27269, 267.69324, 463.75223, 399.45035, 380.34537, 578.5002, 202.46794, 439.60236]
2025-09-12 06:41:50,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 145.0, 104.0, 129.0, 172.0, 156.0, 150.0, 241.0, 105.0, 177.0]
2025-09-12 06:41:50,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 25 minutes, 58 seconds)
2025-09-12 06:53:51,883 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:53:51,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:54:52,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 511.14258 ± 84.677
2025-09-12 06:54:52,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [486.16547, 576.455, 443.00616, 610.04535, 574.8363, 648.1234, 423.66034, 427.93735, 390.3067, 530.8894]
2025-09-12 06:54:52,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 255.0, 194.0, 238.0, 245.0, 250.0, 208.0, 193.0, 190.0, 203.0]
2025-09-12 06:54:52,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (511.14) for latency MM1Queue_a033_s075
2025-09-12 06:54:52,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 17 minutes, 3 seconds)
2025-09-12 07:06:44,507 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:06:44,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:07:38,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 418.80801 ± 152.212
2025-09-12 07:07:38,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [305.2826, 815.30914, 357.2338, 500.83215, 352.95724, 382.32266, 337.59857, 338.79492, 524.58136, 273.16785]
2025-09-12 07:07:38,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 294.0, 193.0, 230.0, 167.0, 165.0, 176.0, 157.0, 225.0, 143.0]
2025-09-12 07:07:38,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 8 minutes, 23 seconds)
2025-09-12 07:19:29,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:29,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:20:40,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 659.46252 ± 150.417
2025-09-12 07:20:40,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [599.9877, 725.2412, 791.3811, 504.34402, 674.6365, 690.5099, 840.7972, 690.2293, 301.3071, 776.1909]
2025-09-12 07:20:40,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [250.0, 261.0, 266.0, 201.0, 265.0, 269.0, 326.0, 265.0, 153.0, 272.0]
2025-09-12 07:20:40,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (659.46) for latency MM1Queue_a033_s075
2025-09-12 07:20:40,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 57 minutes, 21 seconds)
2025-09-12 07:32:26,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:32:26,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:33:41,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 742.35315 ± 222.974
2025-09-12 07:33:41,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [866.55255, 612.1738, 1004.5224, 1007.1559, 555.95074, 567.88885, 363.0788, 815.312, 1040.0662, 590.82983]
2025-09-12 07:33:41,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [295.0, 219.0, 350.0, 327.0, 211.0, 219.0, 167.0, 288.0, 340.0, 233.0]
2025-09-12 07:33:41,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (742.35) for latency MM1Queue_a033_s075
2025-09-12 07:33:41,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 46 minutes, 9 seconds)
2025-09-12 07:45:42,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:42,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:47:10,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 924.62921 ± 257.246
2025-09-12 07:47:10,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1433.2332, 804.92267, 512.5429, 827.8777, 777.89404, 903.1864, 932.2415, 1296.8218, 1028.4417, 729.1305]
2025-09-12 07:47:10,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [465.0, 280.0, 231.0, 289.0, 278.0, 322.0, 304.0, 422.0, 346.0, 273.0]
2025-09-12 07:47:10,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (924.63) for latency MM1Queue_a033_s075
2025-09-12 07:47:10,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 46 minutes, 15 seconds)
2025-09-12 07:59:15,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:15,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:00:34,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 788.08356 ± 144.039
2025-09-12 08:00:34,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [944.0296, 724.0984, 654.9462, 763.89624, 937.5518, 447.53452, 857.80286, 899.4605, 859.8986, 791.6165]
2025-09-12 08:00:34,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 250.0, 235.0, 260.0, 308.0, 186.0, 309.0, 343.0, 297.0, 285.0]
2025-09-12 08:00:34,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 38 minutes, 33 seconds)
2025-09-12 08:12:20,878 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:12:20,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:13:47,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 915.82831 ± 137.178
2025-09-12 08:13:47,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [905.8208, 970.06964, 1123.6742, 761.4271, 835.72687, 960.0263, 918.24554, 670.7774, 1135.3245, 877.19147]
2025-09-12 08:13:47,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 327.0, 368.0, 257.0, 285.0, 322.0, 332.0, 256.0, 363.0, 291.0]
2025-09-12 08:13:47,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 32 minutes, 18 seconds)
2025-09-12 08:25:30,814 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:25:30,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:27:13,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1091.00549 ± 285.744
2025-09-12 08:27:13,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1144.4424, 1597.4043, 1066.98, 477.48257, 1111.5814, 1034.2084, 1430.7064, 1081.6973, 845.56256, 1119.99]
2025-09-12 08:27:13,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [364.0, 514.0, 365.0, 189.0, 390.0, 352.0, 489.0, 355.0, 278.0, 378.0]
2025-09-12 08:27:13,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1091.01) for latency MM1Queue_a033_s075
2025-09-12 08:27:13,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 24 minutes, 58 seconds)
2025-09-12 08:39:11,037 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:39:11,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:41:24,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1495.00195 ± 528.064
2025-09-12 08:41:24,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1412.3617, 1298.5671, 2784.4832, 1115.1633, 1309.5123, 1251.2836, 1356.4762, 1066.8263, 1142.4182, 2212.9282]
2025-09-12 08:41:24,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [462.0, 422.0, 802.0, 365.0, 397.0, 401.0, 450.0, 347.0, 377.0, 703.0]
2025-09-12 08:41:24,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1495.00) for latency MM1Queue_a033_s075
2025-09-12 08:41:24,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 28 minutes, 48 seconds)
2025-09-12 08:53:39,440 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:53:39,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:56:56,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2082.14453 ± 848.685
2025-09-12 08:56:56,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3009.7886, 2677.487, 2602.888, 1572.4404, 1782.3694, 2540.8716, 203.71858, 1106.2272, 2686.8037, 2638.8523]
2025-09-12 08:56:56,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 951.0, 858.0, 532.0, 588.0, 813.0, 112.0, 386.0, 841.0, 889.0]
2025-09-12 08:56:56,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2082.14) for latency MM1Queue_a033_s075
2025-09-12 08:56:56,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 44 minutes, 28 seconds)
2025-09-12 09:08:36,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:08:36,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:10:10,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1074.92017 ± 186.898
2025-09-12 09:10:10,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1345.6888, 1051.5455, 1144.4202, 1413.9294, 1009.58575, 864.20184, 955.975, 981.6252, 1178.3564, 803.87274]
2025-09-12 09:10:10,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [407.0, 334.0, 356.0, 439.0, 318.0, 281.0, 300.0, 296.0, 371.0, 254.0]
2025-09-12 09:10:10,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 28 minutes, 22 seconds)
2025-09-12 09:21:47,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:21:47,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:23:11,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 907.71747 ± 200.512
2025-09-12 09:23:11,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [711.69794, 1065.3772, 827.7672, 1140.8695, 1310.1813, 643.7436, 897.2285, 929.579, 848.6306, 702.0996]
2025-09-12 09:23:11,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [257.0, 363.0, 272.0, 383.0, 424.0, 210.0, 278.0, 305.0, 300.0, 221.0]
2025-09-12 09:23:11,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 11 minutes, 34 seconds)
2025-09-12 09:35:42,945 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:35:42,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:38:50,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2119.57495 ± 751.898
2025-09-12 09:38:50,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1604.4294, 2893.3164, 2980.6882, 1747.9734, 1688.8027, 1936.05, 1238.0248, 1032.5842, 2880.713, 3193.168]
2025-09-12 09:38:50,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [459.0, 937.0, 1000.0, 531.0, 535.0, 609.0, 427.0, 347.0, 860.0, 1000.0]
2025-09-12 09:38:50,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2119.57) for latency MM1Queue_a033_s075
2025-09-12 09:38:50,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 28 minutes, 19 seconds)
2025-09-12 09:50:05,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:50:05,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:52:30,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1723.33765 ± 540.835
2025-09-12 09:52:30,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2001.7471, 2149.4707, 1355.7888, 1577.45, 1974.3153, 2894.461, 756.12555, 1496.6763, 1524.4603, 1502.8822]
2025-09-12 09:52:30,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [636.0, 615.0, 400.0, 452.0, 552.0, 847.0, 286.0, 439.0, 433.0, 452.0]
2025-09-12 09:52:30,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 6 minutes, 59 seconds)
2025-09-12 10:04:30,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:04:30,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:08:07,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2504.97876 ± 964.273
2025-09-12 10:08:07,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3329.6807, 3349.8704, 1435.0256, 1115.8887, 746.40076, 3259.559, 2740.021, 3440.494, 2963.8413, 2669.0054]
2025-09-12 10:08:07,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 436.0, 388.0, 271.0, 1000.0, 882.0, 1000.0, 909.0, 742.0]
2025-09-12 10:08:07,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2504.98) for latency MM1Queue_a033_s075
2025-09-12 10:08:07,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 53 minutes, 56 seconds)
2025-09-12 10:20:38,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:20:38,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:25:12,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3309.58398 ± 294.108
2025-09-12 10:25:12,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3343.4536, 3303.6628, 3432.6484, 3510.0063, 3309.295, 3514.5427, 3229.8284, 3377.895, 2486.2847, 3588.2214]
2025-09-12 10:25:12,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 776.0, 1000.0]
2025-09-12 10:25:12,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3309.58) for latency MM1Queue_a033_s075
2025-09-12 10:25:12,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 30 minutes, 33 seconds)
2025-09-12 10:37:11,946 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:37:11,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:41:04,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2692.58057 ± 588.580
2025-09-12 10:41:04,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3187.641, 3238.3315, 3129.114, 3210.103, 3121.6885, 2183.6016, 1369.0658, 2222.8357, 2773.1128, 2490.3115]
2025-09-12 10:41:04,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 669.0, 418.0, 676.0, 792.0, 711.0]
2025-09-12 10:41:04,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 52 minutes, 35 seconds)
2025-09-12 10:52:12,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:52:12,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:56:39,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2983.51392 ± 357.369
2025-09-12 10:56:39,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3149.4404, 3274.86, 2103.8845, 2496.5618, 3116.516, 3107.282, 3200.8203, 3142.6934, 3050.234, 3192.845]
2025-09-12 10:56:39,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 666.0, 740.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:56:39,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 35 minutes, 59 seconds)
2025-09-12 11:09:25,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:09:25,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:13:58,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3116.25879 ± 212.899
2025-09-12 11:13:58,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3232.6887, 3026.4219, 3022.4575, 2549.0693, 3173.5635, 3351.49, 3290.5193, 3134.626, 3172.5254, 3209.2258]
2025-09-12 11:13:58,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 753.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:13:58,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 6 minutes, 20 seconds)
2025-09-12 11:24:57,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:24:57,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:27:46,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2170.35205 ± 871.301
2025-09-12 11:27:46,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2119.551, 1098.5046, 2270.149, 2672.394, 2189.3586, 973.2837, 3488.9487, 1277.6549, 3643.2126, 1970.4647]
2025-09-12 11:27:46,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [603.0, 342.0, 647.0, 751.0, 579.0, 299.0, 901.0, 388.0, 1000.0, 550.0]
2025-09-12 11:27:46,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 27 minutes, 42 seconds)
2025-09-12 11:39:49,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:39:49,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:43:29,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2783.09912 ± 960.192
2025-09-12 11:43:29,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3552.7678, 3509.136, 956.4929, 3534.0046, 1321.6517, 3000.5793, 3606.5125, 1857.711, 3082.2878, 3409.849]
2025-09-12 11:43:29,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 307.0, 1000.0, 368.0, 820.0, 1000.0, 514.0, 842.0, 1000.0]
2025-09-12 11:43:29,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 54 minutes, 55 seconds)
2025-09-12 11:55:09,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:55:09,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:58:26,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2486.75830 ± 1183.465
2025-09-12 11:58:26,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3523.3396, 1076.1493, 2716.925, 1040.3097, 3647.921, 3341.1997, 1165.2361, 1016.2975, 3736.1567, 3604.0479]
2025-09-12 11:58:26,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 330.0, 765.0, 325.0, 1000.0, 925.0, 363.0, 323.0, 1000.0, 1000.0]
2025-09-12 11:58:26,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 28 minutes, 14 seconds)
2025-09-12 12:11:20,168 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:11:20,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:15:09,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3059.08105 ± 878.062
2025-09-12 12:15:09,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3682.8262, 1561.7032, 3657.8438, 1940.5167, 1781.4602, 3835.0493, 3830.6658, 3201.0083, 3287.8003, 3811.936]
2025-09-12 12:15:09,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 463.0, 1000.0, 530.0, 497.0, 1000.0, 1000.0, 862.0, 914.0, 969.0]
2025-09-12 12:15:09,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 26 minutes, 28 seconds)
2025-09-12 12:26:21,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:26:21,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:30:40,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3156.73730 ± 912.603
2025-09-12 12:30:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3521.8757, 3541.449, 423.7888, 3459.0972, 3458.8708, 3371.7642, 3478.395, 3367.792, 3439.3306, 3505.0105]
2025-09-12 12:30:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 177.0, 1000.0, 1000.0, 948.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:30:41,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 49 minutes, 54 seconds)
2025-09-12 12:42:56,282 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:42:56,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:47:28,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3369.11646 ± 217.496
2025-09-12 12:47:28,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3344.4219, 3430.8008, 3504.081, 3390.7214, 3625.739, 3504.2698, 2906.0215, 3026.643, 3549.3452, 3409.119]
2025-09-12 12:47:28,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 837.0, 841.0, 1000.0, 1000.0]
2025-09-12 12:47:28,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3369.12) for latency MM1Queue_a033_s075
2025-09-12 12:47:28,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 8 minutes, 28 seconds)
2025-09-12 12:58:55,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:58:55,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:03:01,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3075.18506 ± 741.757
2025-09-12 13:03:01,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3196.574, 3524.3118, 1140.4008, 2272.4468, 3171.2249, 3363.925, 3479.7275, 3596.8528, 3542.1384, 3464.2476]
2025-09-12 13:03:01,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [911.0, 1000.0, 345.0, 664.0, 845.0, 927.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:03:01,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 50 minutes, 53 seconds)
2025-09-12 13:15:47,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:15:47,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:20:05,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3146.98291 ± 795.771
2025-09-12 13:20:05,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3602.474, 3434.0798, 848.9556, 2790.3848, 3417.0398, 3496.8247, 3541.6477, 3526.9563, 3456.5737, 3354.8928]
2025-09-12 13:20:05,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 265.0, 823.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:20:06,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 58 minutes, 18 seconds)
2025-09-12 13:31:56,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:31:56,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:35:25,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2712.04102 ± 1012.097
2025-09-12 13:35:25,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2785.6, 2015.64, 3704.0667, 2437.0574, 3666.4956, 430.98047, 3747.542, 2638.332, 3703.1384, 1991.5574]
2025-09-12 13:35:25,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [746.0, 564.0, 1000.0, 671.0, 1000.0, 164.0, 1000.0, 710.0, 1000.0, 533.0]
2025-09-12 13:35:25,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 26 minutes, 43 seconds)
2025-09-12 13:47:20,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:47:20,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:52:02,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3381.73389 ± 107.890
2025-09-12 13:52:02,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3523.4365, 3386.4258, 3228.4631, 3342.2463, 3398.4846, 3480.2725, 3423.8, 3422.9758, 3457.6064, 3153.629]
2025-09-12 13:52:02,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 927.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:52:02,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3381.73) for latency MM1Queue_a033_s075
2025-09-12 13:52:02,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 22 minutes, 24 seconds)
2025-09-12 14:03:06,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:03:06,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:07:25,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3255.60742 ± 685.865
2025-09-12 14:07:25,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2502.278, 3741.6306, 1449.2651, 3668.6155, 3521.1392, 3504.094, 3590.227, 3482.6057, 3553.7976, 3542.4224]
2025-09-12 14:07:25,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [691.0, 1000.0, 451.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:07:25,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 51 minutes, 34 seconds)
2025-09-12 14:20:13,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:20:13,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:24:30,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3224.91650 ± 946.394
2025-09-12 14:24:30,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3525.2908, 3565.6008, 386.97165, 3591.523, 3513.5645, 3544.8862, 3571.4165, 3509.4304, 3540.947, 3499.5322]
2025-09-12 14:24:30,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 152.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:24:30,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 51 minutes, 7 seconds)
2025-09-12 14:36:22,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:36:22,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:40:50,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3476.70972 ± 624.781
2025-09-12 14:40:50,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1608.786, 3751.4749, 3736.228, 3617.5715, 3657.8694, 3599.774, 3642.2888, 3689.951, 3710.207, 3752.9475]
2025-09-12 14:40:50,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [442.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:40:50,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3476.71) for latency MM1Queue_a033_s075
2025-09-12 14:40:50,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 27 minutes, 22 seconds)
2025-09-12 14:52:00,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:52:00,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:56:45,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3566.54175 ± 49.928
2025-09-12 14:56:45,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3611.0657, 3545.231, 3514.786, 3564.2932, 3580.9475, 3602.9822, 3598.9082, 3456.8293, 3637.3013, 3553.0754]
2025-09-12 14:56:45,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:56:45,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3566.54) for latency MM1Queue_a033_s075
2025-09-12 14:56:45,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 17 minutes, 5 seconds)
2025-09-12 15:09:06,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:09:06,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:13:38,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3483.14209 ± 297.372
2025-09-12 15:13:38,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3594.1875, 3614.3484, 3534.536, 3585.1746, 3543.3193, 3608.2478, 2594.2446, 3604.8853, 3584.482, 3567.998]
2025-09-12 15:13:38,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 677.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:13:38,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 3 minutes, 16 seconds)
2025-09-12 15:25:05,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:25:05,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:29:41,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3432.25146 ± 373.344
2025-09-12 15:29:41,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3548.2932, 3746.4382, 3490.497, 2355.9663, 3352.8225, 3607.262, 3632.1926, 3559.8755, 3433.539, 3595.6296]
2025-09-12 15:29:41,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 647.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:29:41,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 53 minutes, 18 seconds)
2025-09-12 15:41:41,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:41:41,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:45:39,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3136.08740 ± 855.916
2025-09-12 15:45:39,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1518.4823, 3689.3816, 3743.0193, 3684.2244, 3753.9353, 1481.1375, 3702.394, 3225.7915, 2932.8052, 3629.7024]
2025-09-12 15:45:39,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [418.0, 1000.0, 1000.0, 1000.0, 1000.0, 423.0, 1000.0, 869.0, 803.0, 1000.0]
2025-09-12 15:45:39,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 26 minutes, 29 seconds)
2025-09-12 15:58:02,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:58:02,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:02:46,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3552.26807 ± 39.850
2025-09-12 16:02:46,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3581.5188, 3590.9763, 3604.0903, 3528.7405, 3488.0212, 3612.3608, 3529.9114, 3540.4216, 3534.0981, 3512.5403]
2025-09-12 16:02:46,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:02:46,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 17 minutes, 28 seconds)
2025-09-12 16:14:44,158 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:14:44,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:19:20,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3616.09131 ± 341.952
2025-09-12 16:19:20,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3776.409, 3728.5388, 3730.9, 2594.5095, 3692.629, 3748.8694, 3738.4448, 3658.811, 3736.6875, 3755.114]
2025-09-12 16:19:20,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 722.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:19:20,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3616.09) for latency MM1Queue_a033_s075
2025-09-12 16:19:20,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 6 minutes, 42 seconds)
2025-09-12 16:31:29,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:31:29,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:35:43,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3400.60791 ± 760.773
2025-09-12 16:35:43,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3822.554, 3772.2732, 3760.7412, 3702.2183, 1728.3258, 3851.6597, 3753.9717, 2047.1436, 3762.9854, 3804.209]
2025-09-12 16:35:43,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 481.0, 1000.0, 1000.0, 574.0, 1000.0, 1000.0]
2025-09-12 16:35:43,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 45 minutes, 54 seconds)
2025-09-12 16:47:39,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:47:39,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:52:17,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3802.72314 ± 45.178
2025-09-12 16:52:17,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3701.2283, 3806.6287, 3789.6226, 3790.6565, 3839.4524, 3788.8696, 3836.9097, 3846.1152, 3764.1936, 3863.552]
2025-09-12 16:52:17,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:52:17,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3802.72) for latency MM1Queue_a033_s075
2025-09-12 16:52:17,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 33 minutes, 50 seconds)
2025-09-12 17:04:19,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:04:19,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:07:38,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2631.84717 ± 1341.080
2025-09-12 17:07:38,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [363.498, 3792.298, 2321.8843, 421.65652, 3824.5557, 1892.9722, 3757.9204, 2196.4548, 3843.6318, 3903.6025]
2025-09-12 17:07:38,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 1000.0, 627.0, 154.0, 1000.0, 523.0, 1000.0, 575.0, 1000.0, 1000.0]
2025-09-12 17:07:38,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 12 minutes, 17 seconds)
2025-09-12 17:19:34,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:19:34,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:24:18,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3859.85400 ± 36.366
2025-09-12 17:24:18,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3911.4705, 3872.3826, 3870.4243, 3869.432, 3901.2998, 3855.77, 3790.8735, 3884.9087, 3826.428, 3815.5522]
2025-09-12 17:24:18,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:24:18,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3859.85) for latency MM1Queue_a033_s075
2025-09-12 17:24:18,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 52 minutes, 11 seconds)
2025-09-12 17:36:14,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:36:14,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:40:53,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3683.13525 ± 26.242
2025-09-12 17:40:53,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3681.4614, 3696.059, 3707.7712, 3640.551, 3731.6091, 3651.81, 3701.5007, 3658.4685, 3678.748, 3683.3748]
2025-09-12 17:40:53,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:40:53,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 36 minutes, 10 seconds)
2025-09-12 17:52:50,140 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:52:50,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:57:35,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3855.05811 ± 42.894
2025-09-12 17:57:35,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3899.634, 3821.0195, 3883.0786, 3873.8013, 3849.1775, 3868.1147, 3883.4922, 3787.4412, 3778.3347, 3906.4878]
2025-09-12 17:57:35,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:57:35,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 22 minutes, 11 seconds)
2025-09-12 18:09:32,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:09:32,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:14:13,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3916.90894 ± 135.319
2025-09-12 18:14:13,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4000.6184, 3987.6245, 3962.886, 3892.5369, 3907.9392, 3984.164, 3901.1555, 3530.3555, 3998.6272, 4003.1797]
2025-09-12 18:14:13,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 875.0, 1000.0, 1000.0]
2025-09-12 18:14:13,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3916.91) for latency MM1Queue_a033_s075
2025-09-12 18:14:13,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 6 minutes, 14 seconds)
2025-09-12 18:25:20,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:25:20,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:30:01,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3812.98706 ± 52.740
2025-09-12 18:30:01,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3845.2017, 3820.8606, 3805.9614, 3830.0679, 3893.7786, 3846.567, 3774.4617, 3826.9397, 3803.2136, 3682.8147]
2025-09-12 18:30:01,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:30:01,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 53 minutes, 6 seconds)
2025-09-12 18:42:23,039 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:42:23,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:46:43,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3563.44336 ± 556.073
2025-09-12 18:46:43,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3846.7234, 3853.7747, 3831.227, 3822.7175, 3756.1055, 2335.6624, 3776.9597, 3893.1985, 2586.548, 3931.517]
2025-09-12 18:46:43,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 632.0, 1000.0, 1000.0, 687.0, 1000.0]
2025-09-12 18:46:43,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 36 minutes, 58 seconds)
2025-09-12 18:58:07,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:58:07,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:02:22,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3360.65308 ± 1055.418
2025-09-12 19:02:22,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3776.7876, 3728.7712, 3784.2688, 3812.4143, 3732.7153, 3814.5874, 3774.2576, 225.22823, 3669.1772, 3288.3237]
2025-09-12 19:02:22,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 101.0, 1000.0, 863.0]
2025-09-12 19:02:22,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 13 minutes, 59 seconds)
2025-09-12 19:14:18,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:14:18,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:18:58,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3736.27734 ± 41.637
2025-09-12 19:18:58,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3753.4583, 3647.8794, 3771.5212, 3687.0815, 3709.7554, 3786.631, 3742.8682, 3782.6484, 3733.5068, 3747.422]
2025-09-12 19:18:58,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:18:58,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 57 minutes, 11 seconds)
2025-09-12 19:30:54,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:30:54,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:35:26,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3717.05713 ± 391.995
2025-09-12 19:35:26,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3872.6462, 3880.6409, 3882.4521, 3877.0803, 2560.8306, 3881.314, 3819.987, 3885.421, 3638.9185, 3871.28]
2025-09-12 19:35:26,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 675.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:35:26,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 39 minutes, 50 seconds)
2025-09-12 19:47:41,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:47:41,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:52:24,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3928.42065 ± 44.172
2025-09-12 19:52:24,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3884.8584, 3860.9077, 3964.1907, 3989.4026, 3863.1584, 3952.5762, 3949.5823, 3975.804, 3907.0908, 3936.6372]
2025-09-12 19:52:24,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:52:24,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3928.42) for latency MM1Queue_a033_s075
2025-09-12 19:52:24,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 30 minutes, 49 seconds)
2025-09-12 20:04:43,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:04:43,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:09:23,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3854.63525 ± 45.890
2025-09-12 20:09:23,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3867.5938, 3851.0017, 3838.212, 3816.424, 3930.035, 3858.785, 3912.4524, 3754.9417, 3866.6643, 3850.2432]
2025-09-12 20:09:23,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:09:23,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 16 minutes, 1 second)
2025-09-12 20:21:19,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:21:19,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:26:04,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3923.67334 ± 65.670
2025-09-12 20:26:04,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3887.618, 3926.9165, 3802.952, 3903.572, 3968.9858, 3853.0774, 3931.5586, 4020.5957, 4025.114, 3916.3455]
2025-09-12 20:26:04,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:26:04,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 5 minutes, 28 seconds)
2025-09-12 20:38:00,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:38:00,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:42:44,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3846.97852 ± 35.360
2025-09-12 20:42:44,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3785.961, 3923.3093, 3859.0107, 3822.3882, 3854.2004, 3842.7507, 3852.4243, 3825.7542, 3823.675, 3880.309]
2025-09-12 20:42:44,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:42:44,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 49 minutes, 3 seconds)
2025-09-12 20:54:41,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:54:41,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:59:28,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3959.55029 ± 44.247
2025-09-12 20:59:28,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3970.04, 4003.0518, 3874.1313, 3956.532, 3941.2134, 3984.6414, 3953.5098, 3983.2776, 4030.337, 3898.7686]
2025-09-12 20:59:28,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:59:28,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3959.55) for latency MM1Queue_a033_s075
2025-09-12 20:59:28,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 33 minutes, 44 seconds)
2025-09-12 21:11:25,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:11:25,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:15:44,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3679.10400 ± 1027.965
2025-09-12 21:15:44,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4089.6758, 4033.9062, 4073.2642, 3997.9265, 4046.1477, 4013.1638, 3951.0708, 598.4149, 4051.8025, 3935.6687]
2025-09-12 21:15:44,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 215.0, 1000.0, 1000.0]
2025-09-12 21:15:44,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 13 minutes, 18 seconds)
2025-09-12 21:27:38,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:27:38,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:32:19,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3942.92773 ± 45.106
2025-09-12 21:32:19,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4016.288, 3876.1543, 3920.4143, 3963.1619, 3987.752, 3915.9104, 3929.003, 3879.5083, 3994.082, 3947.004]
2025-09-12 21:32:19,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:32:19,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 54 minutes, 36 seconds)
2025-09-12 21:44:16,185 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:44:16,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:48:57,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3988.45508 ± 44.096
2025-09-12 21:48:57,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4016.8506, 3945.2224, 3993.4988, 4011.437, 4024.104, 3953.9011, 4025.4644, 3901.5193, 4052.039, 3960.5154]
2025-09-12 21:48:57,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:48:57,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3988.46) for latency MM1Queue_a033_s075
2025-09-12 21:48:57,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 37 minutes, 52 seconds)
2025-09-12 22:00:53,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:00:53,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:05:12,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3635.08398 ± 674.367
2025-09-12 22:05:12,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4028.0034, 2399.324, 2189.6787, 3840.4739, 4006.9783, 3954.0276, 3945.398, 4001.9578, 3927.611, 4057.3884]
2025-09-12 22:05:12,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 628.0, 565.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:05:12,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 19 minutes, 20 seconds)
2025-09-12 22:16:30,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:16:30,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:21:11,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3855.82812 ± 46.397
2025-09-12 22:21:11,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3920.6147, 3945.0112, 3795.9846, 3884.8096, 3822.5076, 3857.2888, 3862.7014, 3817.3555, 3842.3699, 3809.6357]
2025-09-12 22:21:11,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:21:11,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 59 minutes, 35 seconds)
2025-09-12 22:33:09,018 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:33:09,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:37:53,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3973.44873 ± 82.159
2025-09-12 22:37:53,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3996.8442, 4022.01, 3844.0347, 3844.284, 3875.788, 4027.9888, 4025.3242, 4034.5408, 4083.8445, 3979.827]
2025-09-12 22:37:53,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:37:53,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 45 minutes, 3 seconds)
2025-09-12 22:49:52,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:49:52,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:54:32,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4061.19678 ± 30.549
2025-09-12 22:54:32,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4041.8945, 4031.3328, 4091.5999, 4096.8716, 4048.3162, 4031.0352, 4026.8816, 4112.7607, 4087.0776, 4044.1987]
2025-09-12 22:54:32,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:54:32,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (4061.20) for latency MM1Queue_a033_s075
2025-09-12 22:54:32,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 28 minutes, 53 seconds)
2025-09-12 23:06:31,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:06:31,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:10:50,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3581.06909 ± 959.772
2025-09-12 23:10:50,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3925.8284, 3822.3308, 705.12213, 3962.6882, 3859.3015, 3850.3496, 3870.5422, 3965.6624, 3938.8018, 3910.0674]
2025-09-12 23:10:50,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 241.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:10:50,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 11 minutes, 9 seconds)
2025-09-12 23:22:49,105 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:22:49,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:27:14,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3659.75903 ± 900.913
2025-09-12 23:27:14,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3973.6042, 3955.1404, 3952.5852, 3981.5024, 3997.7488, 958.13135, 3950.0815, 3898.2495, 3948.6523, 3981.8958]
2025-09-12 23:27:14,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 313.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:27:14,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 55 minutes, 20 seconds)
2025-09-12 23:39:14,020 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:39:14,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:43:58,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3932.48511 ± 41.867
2025-09-12 23:43:58,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3917.6274, 3923.1567, 3963.1948, 4012.2393, 3963.2375, 3852.9011, 3919.1487, 3945.2131, 3942.5227, 3885.6113]
2025-09-12 23:43:58,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:43:58,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 41 minutes, 28 seconds)
2025-09-12 23:55:58,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:55:58,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:00:12,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3585.60742 ± 1184.995
2025-09-13 00:00:12,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3972.1655, 4008.1882, 3905.545, 3913.3767, 3977.7373, 4044.6238, 3992.9937, 32.745213, 3990.4546, 4018.2458]
2025-09-13 00:00:12,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 30.0, 1000.0, 1000.0]
2025-09-13 00:00:12,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 23 minutes, 23 seconds)
2025-09-13 00:12:07,631 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:12:07,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:16:48,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4059.56982 ± 45.613
2025-09-13 00:16:48,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4143.1294, 4089.316, 4007.6604, 4103.3574, 4079.85, 4035.5344, 4062.4758, 3978.4448, 4063.4377, 4032.4924]
2025-09-13 00:16:48,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:16:48,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 6 minutes, 48 seconds)
2025-09-13 00:28:44,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:28:44,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:33:28,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4097.48389 ± 80.078
2025-09-13 00:33:28,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4082.14, 4219.874, 4184.454, 4152.4067, 4083.796, 4138.3687, 4062.6233, 4034.2773, 3920.6836, 4096.219]
2025-09-13 00:33:28,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:33:28,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (4097.48) for latency MM1Queue_a033_s075
2025-09-13 00:33:28,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 51 minutes, 20 seconds)
2025-09-13 00:45:26,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:45:26,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:49:44,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3690.20312 ± 988.684
2025-09-13 00:49:44,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4099.648, 3598.2424, 4131.836, 4077.7676, 4000.3704, 4145.384, 757.57837, 4058.7122, 4043.7815, 3988.71]
2025-09-13 00:49:44,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 909.0, 1000.0, 1000.0, 1000.0, 1000.0, 250.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:49:44,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 34 minutes, 30 seconds)
2025-09-13 01:01:41,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:01:41,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:06:23,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4029.28394 ± 98.984
2025-09-13 01:06:23,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4061.8196, 3749.6067, 4082.6077, 4040.7583, 4108.5884, 3993.4683, 4027.6646, 4055.5635, 4108.0513, 4064.7126]
2025-09-13 01:06:23,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:06:23,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 17 minutes, 47 seconds)
2025-09-13 01:18:39,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:18:39,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:23:19,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4087.34619 ± 39.575
2025-09-13 01:23:19,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4100.251, 4094.1077, 4173.5386, 4020.0203, 4110.91, 4084.0518, 4107.6733, 4060.8098, 4046.5383, 4075.559]
2025-09-13 01:23:19,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:23:19,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 2 minutes, 51 seconds)
2025-09-13 01:35:14,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:35:14,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:39:48,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3901.58667 ± 273.303
2025-09-13 01:39:48,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4025.8323, 3904.2698, 3970.3584, 3092.3772, 3948.7646, 4048.7542, 4004.0562, 3970.351, 3993.135, 4057.9688]
2025-09-13 01:39:48,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 780.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:39:48,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 45 minutes, 58 seconds)
2025-09-13 01:51:43,596 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:51:43,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:56:26,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3922.25977 ± 38.966
2025-09-13 01:56:26,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3936.2334, 3953.0305, 3930.5408, 3958.308, 3923.541, 3852.9333, 3941.4072, 3863.2952, 3975.5037, 3887.8042]
2025-09-13 01:56:26,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:56:26,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 29 minutes, 21 seconds)
2025-09-13 02:08:13,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:08:13,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:12:47,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3999.51367 ± 410.649
2025-09-13 02:12:47,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4103.298, 4187.215, 4164.0034, 4103.4624, 4102.3267, 4049.7324, 4136.915, 4192.7905, 4180.741, 2774.651]
2025-09-13 02:12:47,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 688.0]
2025-09-13 02:12:47,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 12 minutes, 52 seconds)
2025-09-13 02:24:44,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:24:44,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:29:00,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3590.54053 ± 1137.303
2025-09-13 02:29:00,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3997.6807, 3976.2964, 3937.5706, 3636.9524, 3976.6987, 4062.3374, 3970.7139, 4115.6196, 4033.5269, 198.01234]
2025-09-13 02:29:00,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 100.0]
2025-09-13 02:29:00,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 55 minutes, 39 seconds)
2025-09-13 02:40:55,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:40:55,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:45:33,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4161.97510 ± 172.319
2025-09-13 02:45:33,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4270.3096, 4230.4683, 4214.2446, 3662.1333, 4153.189, 4294.837, 4181.9873, 4239.9, 4150.656, 4222.027]
2025-09-13 02:45:33,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:45:33,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (4161.98) for latency MM1Queue_a033_s075
2025-09-13 02:45:33,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 38 minutes, 41 seconds)
2025-09-13 02:56:43,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:56:43,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:01:14,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3940.54956 ± 285.612
2025-09-13 03:01:14,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4153.982, 4122.708, 4150.547, 3124.9658, 3972.8877, 4097.0776, 3948.4583, 3981.7764, 3930.4126, 3922.6807]
2025-09-13 03:01:14,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 784.0, 1000.0, 1000.0, 1000.0, 1000.0, 973.0, 984.0]
2025-09-13 03:01:14,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 21 minutes, 26 seconds)
2025-09-13 03:13:53,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:13:53,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:18:32,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4103.94824 ± 38.718
2025-09-13 03:18:32,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4057.513, 4088.2383, 4104.117, 4147.5547, 4051.9683, 4170.988, 4080.9282, 4091.8572, 4090.9424, 4155.3755]
2025-09-13 03:18:32,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:18:32,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 5 minutes, 40 seconds)
2025-09-13 03:30:28,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:30:28,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:35:13,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4085.55347 ± 52.891
2025-09-13 03:35:13,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4099.679, 4132.25, 4046.585, 3990.3499, 4091.1318, 4046.7542, 4136.14, 4027.1182, 4164.6313, 4120.893]
2025-09-13 03:35:13,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:35:13,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 49 minutes, 27 seconds)
2025-09-13 03:47:10,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:47:10,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:51:52,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4040.59058 ± 55.682
2025-09-13 03:51:52,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4050.586, 4030.0076, 3966.0413, 4026.9827, 4166.022, 3987.7776, 4071.241, 3976.4397, 4054.4023, 4076.406]
2025-09-13 03:51:52,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:51:53,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 9 seconds)
2025-09-13 04:03:40,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:03:40,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:08:21,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 4039.20044 ± 92.830
2025-09-13 04:08:21,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4051.8145, 4026.881, 4043.1418, 3784.723, 4069.74, 4026.0095, 4033.1047, 4105.8003, 4143.421, 4107.368]
2025-09-13 04:08:21,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:08:21,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 33 seconds)
2025-09-13 04:20:18,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:20:18,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:25:03,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3985.97388 ± 30.408
2025-09-13 04:25:03,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3981.499, 3979.6938, 3992.099, 3989.1128, 3957.8242, 3927.0808, 3983.4666, 3979.4985, 4040.8987, 4028.5667]
2025-09-13 04:25:03,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:25:03,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1251 [DEBUG]: Training session finished
