2025-09-12 00:18:50,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:18:50,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 00:18:50,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x152cd3b56550>}
2025-09-12 00:18:50,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1111 [DEBUG]: using device: cuda
2025-09-12 00:18:50,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1133 [INFO]: Creating new trainer
2025-09-12 00:18:50,899 baseline-mbpac-noiseperc15-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-12 00:18:50,899 baseline-mbpac-noiseperc15-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 00:18:50,908 baseline-mbpac-noiseperc15-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 00:18:51,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1194 [DEBUG]: Starting training session...
2025-09-12 00:18:51,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 00:29:33,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:29:33,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:29:49,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 101.28661 ± 18.640
2025-09-12 00:29:49,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [97.6535, 97.483345, 113.551735, 93.11078, 94.10962, 111.93098, 117.15034, 135.11359, 61.74488, 91.017365]
2025-09-12 00:29:49,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 55.0, 63.0, 52.0, 54.0, 61.0, 63.0, 73.0, 35.0, 52.0]
2025-09-12 00:29:49,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (101.29) for latency MM1Queue_a033_s075
2025-09-12 00:29:49,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 4 minutes, 57 seconds)
2025-09-12 00:42:10,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:42:10,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:42:43,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 143.22993 ± 112.656
2025-09-12 00:42:43,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [267.3156, 102.44718, 343.88086, 234.76956, 61.79, 252.64136, 64.21486, 52.155743, 6.063711, 47.020367]
2025-09-12 00:42:43,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [212.0, 85.0, 254.0, 167.0, 58.0, 197.0, 63.0, 43.0, 10.0, 41.0]
2025-09-12 00:42:43,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (143.23) for latency MM1Queue_a033_s075
2025-09-12 00:42:43,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 29 minutes, 8 seconds)
2025-09-12 00:54:42,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:54:42,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:55:18,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 215.85547 ± 64.262
2025-09-12 00:55:18,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [222.94748, 193.96231, 270.03885, 136.9195, 246.62932, 145.19244, 336.1313, 118.079094, 241.92934, 246.72485]
2025-09-12 00:55:18,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 111.0, 144.0, 96.0, 135.0, 105.0, 195.0, 76.0, 127.0, 134.0]
2025-09-12 00:55:18,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (215.86) for latency MM1Queue_a033_s075
2025-09-12 00:55:18,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 38 minutes, 32 seconds)
2025-09-12 01:07:26,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:07:26,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:08:14,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 268.73486 ± 124.860
2025-09-12 01:08:14,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [74.99687, 421.5717, 220.42651, 361.92752, 249.50633, 434.73822, 320.02832, 40.328335, 260.52777, 303.2969]
2025-09-12 01:08:14,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 277.0, 156.0, 208.0, 178.0, 266.0, 176.0, 27.0, 132.0, 164.0]
2025-09-12 01:08:14,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (268.73) for latency MM1Queue_a033_s075
2025-09-12 01:08:14,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 44 minutes, 59 seconds)
2025-09-12 01:20:39,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:20:39,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:21:06,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 197.64653 ± 144.739
2025-09-12 01:21:06,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [242.89394, 392.97836, 43.739677, 71.531395, 365.96127, 46.51628, 46.923447, 317.06833, 366.05267, 82.8]
2025-09-12 01:21:06,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 207.0, 31.0, 44.0, 173.0, 32.0, 32.0, 122.0, 149.0, 48.0]
2025-09-12 01:21:06,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 42 minutes, 46 seconds)
2025-09-12 01:33:08,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:33:08,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:33:55,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 286.24615 ± 93.535
2025-09-12 01:33:55,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [355.25812, 212.00462, 364.59967, 329.77438, 423.74283, 217.17436, 141.17133, 146.67233, 339.1099, 332.95444]
2025-09-12 01:33:55,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 158.0, 210.0, 147.0, 250.0, 138.0, 76.0, 76.0, 156.0, 157.0]
2025-09-12 01:33:55,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (286.25) for latency MM1Queue_a033_s075
2025-09-12 01:33:55,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 4 minutes, 59 seconds)
2025-09-12 01:46:09,162 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:46:09,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:46:50,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 267.95755 ± 176.541
2025-09-12 01:46:50,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [99.43035, 446.95868, 396.1589, 61.823776, 609.497, 330.74838, 258.42426, 61.711285, 103.14208, 311.68088]
2025-09-12 01:46:50,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 233.0, 187.0, 40.0, 363.0, 138.0, 122.0, 39.0, 59.0, 174.0]
2025-09-12 01:46:50,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 52 minutes, 31 seconds)
2025-09-12 01:59:09,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:59:09,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:59:56,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 338.08563 ± 113.795
2025-09-12 01:59:57,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [209.87468, 246.53998, 357.2575, 463.3103, 459.7997, 292.72034, 446.9595, 236.17384, 491.40396, 176.81648]
2025-09-12 01:59:57,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 139.0, 169.0, 219.0, 183.0, 124.0, 208.0, 122.0, 226.0, 103.0]
2025-09-12 01:59:57,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (338.09) for latency MM1Queue_a033_s075
2025-09-12 01:59:57,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 49 minutes, 17 seconds)
2025-09-12 02:12:16,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:12:16,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:12:54,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 259.92566 ± 178.453
2025-09-12 02:12:54,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [120.30497, 68.48724, 219.44586, 122.14053, 585.7083, 559.05756, 133.88362, 121.68728, 337.667, 330.87424]
2025-09-12 02:12:54,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 43.0, 115.0, 70.0, 270.0, 253.0, 94.0, 76.0, 148.0, 164.0]
2025-09-12 02:12:54,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 37 minutes, 3 seconds)
2025-09-12 02:25:07,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:25:07,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:26:08,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 568.19934 ± 261.579
2025-09-12 02:26:08,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [116.692245, 248.71822, 467.35886, 622.8485, 596.1714, 734.2929, 839.76166, 369.8459, 1033.5421, 652.76184]
2025-09-12 02:26:08,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 118.0, 175.0, 215.0, 208.0, 256.0, 303.0, 164.0, 395.0, 222.0]
2025-09-12 02:26:08,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (568.20) for latency MM1Queue_a033_s075
2025-09-12 02:26:08,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 30 minutes, 24 seconds)
2025-09-12 02:38:23,302 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:38:23,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:39:13,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 459.46027 ± 256.575
2025-09-12 02:39:13,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [746.565, 87.04582, 172.89632, 190.14537, 488.49487, 643.814, 186.00348, 638.0559, 673.6473, 767.93463]
2025-09-12 02:39:13,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [245.0, 49.0, 89.0, 102.0, 201.0, 213.0, 88.0, 218.0, 232.0, 267.0]
2025-09-12 02:39:13,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 22 minutes, 34 seconds)
2025-09-12 02:51:46,998 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:51:47,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:52:22,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 307.13239 ± 221.372
2025-09-12 02:52:22,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [125.72698, 158.82993, 579.33826, 546.13684, 69.71551, 162.44855, 146.04639, 734.72485, 362.71594, 185.64053]
2025-09-12 02:52:22,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 77.0, 200.0, 192.0, 43.0, 84.0, 75.0, 251.0, 147.0, 89.0]
2025-09-12 02:52:22,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 13 minutes, 32 seconds)
2025-09-12 03:04:25,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:04:25,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:05:19,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 488.67780 ± 279.134
2025-09-12 03:05:19,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [676.90405, 184.45607, 830.1614, 206.45639, 728.65845, 108.32784, 674.9051, 708.61615, 112.3454, 655.94714]
2025-09-12 03:05:19,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [246.0, 97.0, 301.0, 97.0, 253.0, 59.0, 242.0, 252.0, 61.0, 232.0]
2025-09-12 03:05:19,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 57 minutes, 26 seconds)
2025-09-12 03:17:43,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:17:43,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:18:35,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 498.40820 ± 311.049
2025-09-12 03:18:35,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [26.803108, 814.5099, 478.028, 1037.455, 569.7742, 354.9825, 77.91575, 231.92772, 662.0399, 730.646]
2025-09-12 03:18:35,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 255.0, 180.0, 371.0, 194.0, 135.0, 46.0, 103.0, 218.0, 233.0]
2025-09-12 03:18:35,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 49 minutes, 39 seconds)
2025-09-12 03:30:22,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:30:22,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:31:24,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 568.53436 ± 492.632
2025-09-12 03:31:24,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [643.2467, 740.21045, 731.1666, 97.70291, 253.76337, 67.5036, 1077.0641, 256.85892, 132.3485, 1685.4786]
2025-09-12 03:31:24,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 259.0, 277.0, 55.0, 125.0, 43.0, 334.0, 130.0, 71.0, 601.0]
2025-09-12 03:31:24,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (568.53) for latency MM1Queue_a033_s075
2025-09-12 03:31:24,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 29 minutes, 32 seconds)
2025-09-12 03:43:28,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:43:29,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:44:17,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 432.67035 ± 188.975
2025-09-12 03:44:17,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [629.2489, 337.1231, 177.263, 499.59286, 225.01387, 751.2774, 408.97488, 665.53577, 397.40457, 235.26953]
2025-09-12 03:44:17,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [232.0, 139.0, 85.0, 181.0, 103.0, 277.0, 168.0, 261.0, 171.0, 110.0]
2025-09-12 03:44:17,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 13 minutes, 5 seconds)
2025-09-12 03:56:52,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:56:52,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:58:15,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 757.16150 ± 431.523
2025-09-12 03:58:15,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [608.14264, 1085.5881, 394.2621, 390.8939, 1581.0538, 310.24176, 1203.2755, 214.60594, 721.4237, 1062.1277]
2025-09-12 03:58:15,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [239.0, 384.0, 148.0, 163.0, 575.0, 139.0, 421.0, 101.0, 277.0, 396.0]
2025-09-12 03:58:15,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (757.16) for latency MM1Queue_a033_s075
2025-09-12 03:58:15,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 13 minutes, 34 seconds)
2025-09-12 04:09:41,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:09:41,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:11:06,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 833.20056 ± 539.488
2025-09-12 04:11:06,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [418.0609, 629.2627, 696.0627, 551.1796, 1163.2827, 464.0271, 500.71198, 923.04504, 667.60376, 2318.7693]
2025-09-12 04:11:06,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 228.0, 252.0, 220.0, 393.0, 181.0, 205.0, 323.0, 234.0, 776.0]
2025-09-12 04:11:06,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (833.20) for latency MM1Queue_a033_s075
2025-09-12 04:11:06,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 58 minutes, 55 seconds)
2025-09-12 04:23:18,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:23:18,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:24:41,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 736.00549 ± 480.569
2025-09-12 04:24:41,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [876.2369, 1358.938, 531.93054, 1149.2249, 581.4186, 767.87225, 160.19955, 1574.279, 256.98846, 102.9674]
2025-09-12 04:24:41,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [336.0, 500.0, 219.0, 424.0, 222.0, 305.0, 82.0, 596.0, 131.0, 58.0]
2025-09-12 04:24:41,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 50 minutes, 42 seconds)
2025-09-12 04:36:55,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:36:55,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:38:41,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 988.28601 ± 660.233
2025-09-12 04:38:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [570.19165, 1096.1356, 1998.3728, 2142.2927, 256.0773, 1031.2931, 1168.1672, 179.0828, 245.12227, 1196.1254]
2025-09-12 04:38:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 357.0, 724.0, 744.0, 132.0, 374.0, 450.0, 92.0, 132.0, 437.0]
2025-09-12 04:38:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (988.29) for latency MM1Queue_a033_s075
2025-09-12 04:38:41,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 56 minutes, 33 seconds)
2025-09-12 04:50:57,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:50:57,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:52:28,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 813.57574 ± 591.603
2025-09-12 04:52:28,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [99.415306, 591.18036, 872.58026, 394.32376, 1681.4347, 819.22723, 1595.0227, 432.4695, 1616.1992, 33.90403]
2025-09-12 04:52:28,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 239.0, 346.0, 174.0, 656.0, 289.0, 585.0, 191.0, 637.0, 27.0]
2025-09-12 04:52:28,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 57 minutes, 17 seconds)
2025-09-12 05:04:06,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:04:06,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:05:30,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 740.01312 ± 598.567
2025-09-12 05:05:30,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1252.3628, 2086.7778, 318.14142, 1292.2906, 558.04474, 461.48843, 409.68607, 804.57715, 79.475624, 137.28682]
2025-09-12 05:05:30,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [440.0, 836.0, 153.0, 501.0, 230.0, 193.0, 166.0, 322.0, 46.0, 72.0]
2025-09-12 05:05:30,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 17 hours, 29 minutes, 11 seconds)
2025-09-12 05:17:38,753 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:17:38,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:19:09,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 798.16571 ± 449.811
2025-09-12 05:19:09,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1281.2537, 1163.2382, 142.44626, 175.43076, 302.35736, 1035.2582, 1061.0051, 1448.082, 585.45966, 787.12665]
2025-09-12 05:19:09,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [460.0, 420.0, 74.0, 88.0, 140.0, 405.0, 409.0, 551.0, 250.0, 325.0]
2025-09-12 05:19:09,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 17 hours, 27 minutes, 52 seconds)
2025-09-12 05:31:38,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:31:38,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:32:38,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 570.64355 ± 413.451
2025-09-12 05:32:38,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [832.5735, 1493.1865, 721.2201, 192.03851, 256.73184, 201.29675, 891.1052, 135.59036, 286.1481, 696.54474]
2025-09-12 05:32:38,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [320.0, 483.0, 251.0, 97.0, 122.0, 103.0, 285.0, 68.0, 128.0, 253.0]
2025-09-12 05:32:39,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 13 minutes, 4 seconds)
2025-09-12 05:44:23,050 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:44:23,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:45:45,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 788.42938 ± 381.730
2025-09-12 05:45:45,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [761.7534, 179.68498, 1044.5426, 1084.7395, 642.64026, 977.5272, 1047.5897, 619.71783, 139.12143, 1386.9766]
2025-09-12 05:45:45,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [295.0, 89.0, 374.0, 361.0, 244.0, 332.0, 389.0, 231.0, 72.0, 437.0]
2025-09-12 05:45:45,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 46 minutes, 1 second)
2025-09-12 05:58:14,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:58:14,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:58:59,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 417.66913 ± 390.464
2025-09-12 05:58:59,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [71.464935, 195.96687, 777.30493, 193.12813, 224.57109, 199.00919, 226.04639, 1335.032, 797.5689, 156.59909]
2025-09-12 05:58:59,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 99.0, 252.0, 99.0, 104.0, 98.0, 109.0, 412.0, 260.0, 79.0]
2025-09-12 05:58:59,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 24 minutes, 21 seconds)
2025-09-12 06:10:35,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:10:35,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:12:21,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1091.12012 ± 603.959
2025-09-12 06:12:21,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [308.4948, 720.8697, 1016.2395, 609.37714, 1205.3734, 2075.0198, 1308.5511, 1096.9025, 402.9599, 2167.4126]
2025-09-12 06:12:21,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 239.0, 309.0, 229.0, 434.0, 665.0, 413.0, 354.0, 157.0, 711.0]
2025-09-12 06:12:21,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1091.12) for latency MM1Queue_a033_s075
2025-09-12 06:12:21,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 15 minutes, 49 seconds)
2025-09-12 06:24:28,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:24:28,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:25:33,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 597.13220 ± 550.433
2025-09-12 06:25:33,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [260.02753, 110.043, 133.68391, 529.8002, 730.6872, 529.7338, 819.1941, 705.91003, 2055.1926, 97.04943]
2025-09-12 06:25:33,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 61.0, 67.0, 192.0, 269.0, 197.0, 282.0, 266.0, 709.0, 55.0]
2025-09-12 06:25:33,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 56 minutes, 12 seconds)
2025-09-12 06:37:55,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:37:55,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:38:58,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 607.05609 ± 468.078
2025-09-12 06:38:58,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [60.54421, 301.1215, 418.32187, 953.5078, 555.8423, 1671.7961, 437.3035, 1060.3511, 92.51298, 519.25977]
2025-09-12 06:38:58,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 122.0, 166.0, 325.0, 211.0, 512.0, 169.0, 379.0, 52.0, 195.0]
2025-09-12 06:38:58,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 41 minutes, 43 seconds)
2025-09-12 06:51:39,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:51:39,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:53:01,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 778.92963 ± 693.675
2025-09-12 06:53:01,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [318.71213, 109.950035, 2178.1917, 654.8336, 340.32703, 71.02932, 1777.9946, 1202.1804, 899.2977, 236.77977]
2025-09-12 06:53:01,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 60.0, 757.0, 247.0, 144.0, 44.0, 588.0, 449.0, 336.0, 108.0]
2025-09-12 06:53:01,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 41 minutes, 47 seconds)
2025-09-12 07:04:22,031 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:04:22,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:06:00,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 974.12097 ± 786.193
2025-09-12 07:06:00,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2672.0652, 1376.8573, 1119.4581, 632.95996, 1635.691, 359.36945, 88.89103, 294.84897, 123.770294, 1437.299]
2025-09-12 07:06:00,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [904.0, 471.0, 399.0, 226.0, 533.0, 144.0, 53.0, 128.0, 65.0, 494.0]
2025-09-12 07:06:00,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 15 hours, 24 minutes, 50 seconds)
2025-09-12 07:18:04,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:18:04,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:19:29,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 863.49316 ± 729.610
2025-09-12 07:19:29,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [113.14434, 1262.9932, 70.77504, 670.465, 476.18808, 1682.1351, 1805.3464, 150.69275, 2065.0305, 338.16086]
2025-09-12 07:19:29,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 402.0, 41.0, 247.0, 183.0, 572.0, 624.0, 74.0, 680.0, 139.0]
2025-09-12 07:19:29,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 13 minutes, 3 seconds)
2025-09-12 07:32:15,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:32:15,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:33:35,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 778.39325 ± 671.250
2025-09-12 07:33:35,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [288.99316, 637.2343, 1566.0726, 299.53616, 219.73257, 425.69583, 389.99753, 821.3089, 2451.2664, 684.095]
2025-09-12 07:33:35,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 243.0, 503.0, 127.0, 107.0, 168.0, 160.0, 315.0, 791.0, 255.0]
2025-09-12 07:33:35,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 11 minutes, 46 seconds)
2025-09-12 07:45:30,901 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:30,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:47:22,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1114.24402 ± 712.381
2025-09-12 07:47:22,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [815.84753, 2139.047, 994.1775, 1562.0284, 1085.0653, 1992.0907, 310.91562, 1870.1097, 262.1263, 111.03204]
2025-09-12 07:47:22,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [260.0, 761.0, 371.0, 557.0, 338.0, 643.0, 137.0, 658.0, 108.0, 60.0]
2025-09-12 07:47:22,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1114.24) for latency MM1Queue_a033_s075
2025-09-12 07:47:22,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 3 minutes, 2 seconds)
2025-09-12 07:59:09,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:09,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:00:38,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 848.39081 ± 636.673
2025-09-12 08:00:38,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [302.71564, 665.98505, 203.85806, 509.20508, 1465.1844, 303.15872, 1378.4155, 2248.9653, 1042.0419, 364.37814]
2025-09-12 08:00:38,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [136.0, 245.0, 95.0, 197.0, 497.0, 130.0, 482.0, 726.0, 378.0, 150.0]
2025-09-12 08:00:38,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 38 minutes, 51 seconds)
2025-09-12 08:12:38,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:12:38,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:13:25,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 416.28882 ± 322.112
2025-09-12 08:13:25,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [437.26663, 139.75847, 754.639, 623.9762, 259.53467, 13.387508, 128.64658, 71.53158, 930.8109, 803.33655]
2025-09-12 08:13:25,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [172.0, 72.0, 270.0, 221.0, 115.0, 14.0, 68.0, 43.0, 333.0, 290.0]
2025-09-12 08:13:25,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 22 minutes, 54 seconds)
2025-09-12 08:26:02,500 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:26:02,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:28:02,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1203.16125 ± 846.483
2025-09-12 08:28:02,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2490.6104, 1033.8312, 626.03436, 2950.9324, 161.63165, 519.64575, 553.40326, 957.8062, 1401.0983, 1336.6195]
2025-09-12 08:28:02,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [846.0, 365.0, 228.0, 1000.0, 78.0, 196.0, 202.0, 336.0, 442.0, 453.0]
2025-09-12 08:28:02,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1203.16) for latency MM1Queue_a033_s075
2025-09-12 08:28:02,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 23 minutes, 51 seconds)
2025-09-12 08:39:58,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:39:58,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:41:29,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 860.78711 ± 788.211
2025-09-12 08:41:29,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [861.167, 898.1739, 187.89516, 68.54207, 975.3167, 1428.2332, 385.7618, 735.6542, 182.77617, 2884.3508]
2025-09-12 08:41:29,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [318.0, 326.0, 92.0, 43.0, 348.0, 505.0, 163.0, 268.0, 87.0, 1000.0]
2025-09-12 08:41:29,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 1 minute, 58 seconds)
2025-09-12 08:54:13,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:54:13,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:56:48,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1554.90454 ± 1129.988
2025-09-12 08:56:48,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2617.9263, 2395.182, 2733.711, 135.6969, 657.18634, 2953.118, 163.07048, 1098.7192, 262.98486, 2531.4507]
2025-09-12 08:56:48,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [903.0, 797.0, 904.0, 69.0, 244.0, 1000.0, 80.0, 399.0, 118.0, 847.0]
2025-09-12 08:56:48,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1554.90) for latency MM1Queue_a033_s075
2025-09-12 08:56:48,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 7 minutes, 4 seconds)
2025-09-12 09:08:16,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:08:16,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:09:36,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 781.97522 ± 696.899
2025-09-12 09:09:36,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [50.692593, 1865.969, 101.44775, 1360.5931, 1959.4106, 418.3595, 1099.1743, 472.03235, 73.94122, 418.1318]
2025-09-12 09:09:36,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [36.0, 641.0, 56.0, 463.0, 632.0, 157.0, 388.0, 191.0, 46.0, 171.0]
2025-09-12 09:09:36,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 47 minutes, 36 seconds)
2025-09-12 09:21:22,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:21:22,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:22:04,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 368.56909 ± 377.878
2025-09-12 09:22:04,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1192.0044, 865.16736, 85.833206, 227.66747, 78.04966, 57.44346, 69.18979, 73.751854, 497.19736, 539.3862]
2025-09-12 09:22:04,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [430.0, 315.0, 51.0, 107.0, 47.0, 38.0, 44.0, 45.0, 190.0, 205.0]
2025-09-12 09:22:04,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 30 minutes, 11 seconds)
2025-09-12 09:34:26,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:34:26,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:35:49,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 852.26532 ± 479.873
2025-09-12 09:35:49,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [201.31688, 990.99384, 1301.863, 1117.0826, 1250.9694, 1140.1449, 1410.2067, 110.16483, 843.06055, 156.85036]
2025-09-12 09:35:49,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 333.0, 450.0, 366.0, 418.0, 383.0, 444.0, 61.0, 278.0, 76.0]
2025-09-12 09:35:49,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 6 minutes, 16 seconds)
2025-09-12 09:47:46,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:47:46,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:49:21,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 948.39374 ± 871.799
2025-09-12 09:49:21,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [345.10352, 281.60608, 1228.736, 2144.5723, 826.76324, 204.3154, 2471.9685, 1855.6284, 49.484867, 75.75858]
2025-09-12 09:49:21,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 129.0, 420.0, 742.0, 248.0, 97.0, 796.0, 613.0, 30.0, 45.0]
2025-09-12 09:49:21,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 53 minutes, 33 seconds)
2025-09-12 10:01:31,366 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:01:31,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:02:38,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 666.22614 ± 441.777
2025-09-12 10:02:38,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [479.09824, 239.29974, 700.79803, 236.74246, 1493.566, 1160.4745, 74.81606, 418.7284, 1117.0525, 741.6851]
2025-09-12 10:02:38,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [183.0, 110.0, 248.0, 106.0, 479.0, 374.0, 45.0, 160.0, 379.0, 259.0]
2025-09-12 10:02:38,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 17 minutes, 20 seconds)
2025-09-12 10:15:10,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:15:10,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:16:38,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 884.18311 ± 806.266
2025-09-12 10:16:38,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [542.55664, 425.95508, 1732.7266, 226.77568, 117.12907, 625.4692, 2758.4097, 945.60956, 1367.875, 99.32427]
2025-09-12 10:16:38,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 169.0, 513.0, 105.0, 62.0, 228.0, 886.0, 352.0, 429.0, 56.0]
2025-09-12 10:16:38,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 17 minutes, 23 seconds)
2025-09-12 10:28:30,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:28:30,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:30:13,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1058.78589 ± 868.681
2025-09-12 10:30:13,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [663.95215, 451.1869, 1678.6628, 295.13248, 210.7984, 2471.87, 148.08588, 709.31415, 2567.2295, 1391.627]
2025-09-12 10:30:13,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [238.0, 175.0, 525.0, 127.0, 93.0, 793.0, 74.0, 222.0, 839.0, 465.0]
2025-09-12 10:30:13,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 16 minutes, 1 second)
2025-09-12 10:42:17,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:42:17,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:44:16,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1152.45630 ± 1145.036
2025-09-12 10:44:16,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [860.11664, 226.61816, 2893.9868, 136.13882, 2845.7183, 14.591599, 314.5155, 1381.3256, 2658.0117, 193.53983]
2025-09-12 10:44:16,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 102.0, 1000.0, 69.0, 975.0, 16.0, 136.0, 479.0, 887.0, 102.0]
2025-09-12 10:44:16,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 5 minutes, 29 seconds)
2025-09-12 10:56:12,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:56:12,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:57:18,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 665.18860 ± 397.786
2025-09-12 10:57:18,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1093.0594, 558.6887, 155.48297, 1457.1868, 75.53375, 668.0855, 396.78656, 726.8969, 587.89246, 932.27277]
2025-09-12 10:57:18,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [343.0, 207.0, 74.0, 454.0, 48.0, 232.0, 155.0, 259.0, 211.0, 308.0]
2025-09-12 10:57:18,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 46 minutes, 44 seconds)
2025-09-12 11:09:39,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:09:39,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:11:19,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 996.58331 ± 847.255
2025-09-12 11:11:19,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [918.1822, 308.8198, 2942.4417, 244.3189, 1247.0355, 297.07355, 94.712906, 663.0174, 1424.8805, 1825.3508]
2025-09-12 11:11:19,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [334.0, 132.0, 967.0, 131.0, 393.0, 118.0, 54.0, 246.0, 454.0, 617.0]
2025-09-12 11:11:19,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 40 minutes, 33 seconds)
2025-09-12 11:23:24,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:23:24,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:25:02,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 960.39783 ± 857.894
2025-09-12 11:25:02,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [406.31097, 3017.7068, 1925.5829, 1062.5701, 526.28375, 397.13235, 408.45706, 203.0544, 1307.7687, 349.11133]
2025-09-12 11:25:02,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 1000.0, 638.0, 373.0, 193.0, 154.0, 161.0, 95.0, 435.0, 141.0]
2025-09-12 11:25:02,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 23 minutes, 57 seconds)
2025-09-12 11:37:37,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:37:37,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:39:09,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 911.33917 ± 745.147
2025-09-12 11:39:09,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2664.5337, 1460.474, 79.522354, 957.54114, 332.3454, 963.77325, 174.83101, 1384.3514, 308.26282, 787.7557]
2025-09-12 11:39:09,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [901.0, 485.0, 48.0, 326.0, 138.0, 334.0, 85.0, 443.0, 126.0, 271.0]
2025-09-12 11:39:09,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 15 minutes, 25 seconds)
2025-09-12 11:51:41,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:51:41,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:53:44,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1263.48767 ± 619.850
2025-09-12 11:53:44,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2902.5818, 1448.5844, 1179.3097, 1075.3715, 1127.8545, 418.67242, 1226.1946, 1112.2168, 1424.9622, 719.1298]
2025-09-12 11:53:44,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 498.0, 379.0, 330.0, 382.0, 170.0, 421.0, 378.0, 495.0, 254.0]
2025-09-12 11:53:44,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 6 minutes, 57 seconds)
2025-09-12 12:04:52,399 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:04:52,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:06:01,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 702.66956 ± 642.199
2025-09-12 12:06:01,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [470.89847, 1651.995, 7.673565, 1717.7289, 908.9698, 295.61874, 195.4246, 265.60236, 51.571167, 1461.213]
2025-09-12 12:06:01,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 512.0, 10.0, 519.0, 305.0, 127.0, 92.0, 113.0, 31.0, 498.0]
2025-09-12 12:06:01,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 45 minutes, 56 seconds)
2025-09-12 12:18:09,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:18:09,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:20:01,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1123.46155 ± 985.809
2025-09-12 12:20:01,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [59.87386, 375.4612, 781.9635, 426.78867, 964.1691, 445.50403, 484.7732, 2974.2197, 2211.572, 2510.2898]
2025-09-12 12:20:01,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 151.0, 245.0, 161.0, 340.0, 168.0, 187.0, 1000.0, 720.0, 827.0]
2025-09-12 12:20:01,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 32 minutes, 1 second)
2025-09-12 12:32:18,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:32:18,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:33:47,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 863.19275 ± 758.805
2025-09-12 12:33:47,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [64.149536, 927.7937, 899.56696, 2924.208, 543.48944, 554.9484, 607.2562, 1161.511, 800.2407, 148.76381]
2025-09-12 12:33:47,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 339.0, 321.0, 1000.0, 205.0, 212.0, 207.0, 407.0, 299.0, 74.0]
2025-09-12 12:33:47,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 18 minutes, 50 seconds)
2025-09-12 12:45:47,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:45:47,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:47:42,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1242.14771 ± 1081.810
2025-09-12 12:47:42,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [165.83156, 1845.5371, 908.4091, 3118.0168, 1173.6554, 875.90924, 152.98174, 70.98388, 3180.2178, 929.9336]
2025-09-12 12:47:42,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 593.0, 279.0, 1000.0, 354.0, 257.0, 75.0, 44.0, 1000.0, 298.0]
2025-09-12 12:47:42,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 3 minutes, 13 seconds)
2025-09-12 13:00:38,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:00:38,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:02:29,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1143.16504 ± 953.122
2025-09-12 13:02:29,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2797.3936, 1590.2158, 69.57233, 102.6002, 1451.2305, 283.35327, 822.8998, 455.20172, 1134.0624, 2725.121]
2025-09-12 13:02:29,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [871.0, 514.0, 44.0, 57.0, 504.0, 121.0, 292.0, 181.0, 366.0, 886.0]
2025-09-12 13:02:29,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 51 minutes, 8 seconds)
2025-09-12 13:14:04,467 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:14:04,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:15:57,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1186.01416 ± 902.993
2025-09-12 13:15:57,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [659.4962, 3195.3186, 375.2946, 1456.4077, 2141.0708, 1527.488, 266.2273, 223.57988, 658.0606, 1357.1985]
2025-09-12 13:15:57,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 1000.0, 146.0, 448.0, 670.0, 520.0, 119.0, 101.0, 247.0, 455.0]
2025-09-12 13:15:57,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 47 minutes, 23 seconds)
2025-09-12 13:27:48,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:27:48,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:29:14,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 901.19659 ± 637.821
2025-09-12 13:29:14,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [377.6395, 1610.192, 254.06528, 1655.5382, 302.16138, 788.7967, 1292.2814, 84.427505, 1938.0305, 708.83295]
2025-09-12 13:29:14,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 498.0, 114.0, 501.0, 134.0, 281.0, 419.0, 50.0, 585.0, 244.0]
2025-09-12 13:29:14,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 27 minutes, 33 seconds)
2025-09-12 13:42:22,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:42:22,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:43:56,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1005.28827 ± 575.751
2025-09-12 13:43:56,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1419.3038, 1907.3644, 1042.2423, 805.01086, 1845.3784, 394.68472, 1002.7201, 226.27724, 1164.5195, 245.38063]
2025-09-12 13:43:56,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [433.0, 593.0, 321.0, 274.0, 586.0, 157.0, 334.0, 102.0, 366.0, 109.0]
2025-09-12 13:43:56,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 21 minutes, 12 seconds)
2025-09-12 13:55:35,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:55:35,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:57:54,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1438.31787 ± 1188.690
2025-09-12 13:57:54,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2833.6553, 2299.9426, 736.41766, 3090.4102, 42.37589, 466.55978, 478.619, 3106.6558, 1124.0321, 204.50972]
2025-09-12 13:57:54,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [912.0, 738.0, 267.0, 1000.0, 32.0, 182.0, 188.0, 1000.0, 394.0, 95.0]
2025-09-12 13:57:54,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 7 minutes, 37 seconds)
2025-09-12 14:09:53,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:09:53,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:11:31,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1029.06409 ± 901.955
2025-09-12 14:11:31,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [304.05093, 138.71692, 3233.6394, 1141.2701, 373.47437, 1035.4462, 1179.2311, 120.92271, 924.79486, 1839.0935]
2025-09-12 14:11:31,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [132.0, 71.0, 1000.0, 360.0, 149.0, 335.0, 396.0, 65.0, 295.0, 601.0]
2025-09-12 14:11:31,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 44 minutes, 40 seconds)
2025-09-12 14:23:16,703 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:23:16,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:25:27,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1362.05273 ± 896.466
2025-09-12 14:25:27,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [173.6973, 1121.9897, 1830.4056, 3035.3264, 617.98267, 1104.5092, 2717.7107, 1236.3242, 329.73883, 1452.843]
2025-09-12 14:25:27,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 403.0, 642.0, 938.0, 217.0, 333.0, 850.0, 426.0, 137.0, 482.0]
2025-09-12 14:25:27,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 34 minutes, 21 seconds)
2025-09-12 14:37:39,446 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:37:39,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:39:58,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1417.96509 ± 834.386
2025-09-12 14:39:58,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1114.7947, 246.50981, 1633.1112, 574.6044, 3005.927, 209.6984, 1755.7804, 1823.8572, 1897.9473, 1917.421]
2025-09-12 14:39:58,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [380.0, 108.0, 542.0, 219.0, 1000.0, 95.0, 566.0, 587.0, 626.0, 641.0]
2025-09-12 14:39:58,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 29 minutes, 12 seconds)
2025-09-12 14:52:57,954 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:52:57,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:54:22,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 870.44080 ± 833.117
2025-09-12 14:54:22,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2753.5046, 80.20566, 1602.8778, 96.201294, 1251.5433, 307.4018, 895.6699, 106.96679, 1343.6252, 266.4122]
2025-09-12 14:54:22,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [827.0, 46.0, 495.0, 55.0, 429.0, 131.0, 313.0, 59.0, 444.0, 126.0]
2025-09-12 14:54:22,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 13 minutes, 2 seconds)
2025-09-12 15:05:42,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:05:42,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:07:23,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1010.78320 ± 637.332
2025-09-12 15:07:23,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1691.1443, 802.248, 1440.3069, 108.14396, 1241.1794, 123.247025, 1544.6521, 1876.5181, 189.46855, 1090.9237]
2025-09-12 15:07:23,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [552.0, 281.0, 483.0, 58.0, 419.0, 63.0, 512.0, 622.0, 90.0, 373.0]
2025-09-12 15:07:23,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 52 minutes, 26 seconds)
2025-09-12 15:19:56,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:19:56,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:21:49,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1178.68701 ± 1080.310
2025-09-12 15:21:49,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1445.5952, 3140.0178, 1230.2478, 88.50242, 3177.3105, 353.25766, 1030.7601, 95.262505, 770.8535, 455.06287]
2025-09-12 15:21:49,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [466.0, 1000.0, 397.0, 52.0, 1000.0, 143.0, 337.0, 54.0, 273.0, 173.0]
2025-09-12 15:21:49,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 43 minutes, 58 seconds)
2025-09-12 15:33:41,019 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:33:41,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:35:00,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 829.37000 ± 648.000
2025-09-12 15:35:00,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1369.1266, 747.31604, 405.50354, 670.4005, 412.07797, 738.04205, 311.26324, 1059.5034, 113.931, 2466.5356]
2025-09-12 15:35:00,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [407.0, 267.0, 155.0, 250.0, 157.0, 262.0, 130.0, 334.0, 61.0, 723.0]
2025-09-12 15:35:00,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 25 minutes, 7 seconds)
2025-09-12 15:47:27,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:47:27,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:49:16,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1127.82324 ± 897.151
2025-09-12 15:49:16,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [122.264595, 3087.9058, 1793.802, 289.65515, 1844.7426, 1242.6632, 1010.4516, 451.01602, 120.142296, 1315.5886]
2025-09-12 15:49:16,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 1000.0, 612.0, 129.0, 570.0, 395.0, 334.0, 192.0, 62.0, 431.0]
2025-09-12 15:49:16,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 9 minutes, 43 seconds)
2025-09-12 16:01:24,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:01:24,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:03:33,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1375.08716 ± 841.110
2025-09-12 16:03:33,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [380.977, 465.3615, 1501.5024, 1619.0864, 2109.7473, 1431.6389, 1566.5708, 151.9548, 3137.9067, 1386.1254]
2025-09-12 16:03:33,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 173.0, 498.0, 488.0, 673.0, 484.0, 522.0, 73.0, 1000.0, 441.0]
2025-09-12 16:03:33,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 55 minutes, 6 seconds)
2025-09-12 16:15:18,350 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:15:18,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:17:36,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1498.62036 ± 856.663
2025-09-12 16:17:36,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2063.7195, 1449.2546, 1198.32, 386.467, 1461.7288, 2626.045, 1224.2943, 534.62604, 3218.2441, 823.5049]
2025-09-12 16:17:36,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [651.0, 432.0, 408.0, 151.0, 434.0, 829.0, 410.0, 199.0, 1000.0, 302.0]
2025-09-12 16:17:36,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 47 minutes, 17 seconds)
2025-09-12 16:29:39,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:29:39,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:31:47,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1324.97864 ± 718.052
2025-09-12 16:31:47,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1456.7333, 1492.9188, 119.897385, 3149.7195, 1255.7227, 1126.2864, 826.7462, 1166.1256, 1254.4849, 1401.1515]
2025-09-12 16:31:47,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [484.0, 507.0, 78.0, 1000.0, 422.0, 348.0, 284.0, 389.0, 424.0, 437.0]
2025-09-12 16:31:47,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 31 minutes, 47 seconds)
2025-09-12 16:44:37,597 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:37,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:47:04,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1554.34302 ± 1216.146
2025-09-12 16:47:04,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [70.23419, 1883.1389, 3147.091, 61.188553, 3015.4485, 825.3673, 454.15543, 3006.5898, 2469.3533, 610.8623]
2025-09-12 16:47:04,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 589.0, 1000.0, 37.0, 978.0, 278.0, 172.0, 953.0, 793.0, 217.0]
2025-09-12 16:47:05,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 29 minutes, 10 seconds)
2025-09-12 16:58:35,899 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:58:35,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:00:04,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 888.00769 ± 777.207
2025-09-12 17:00:04,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1511.8145, 1238.3326, 2724.2725, 250.0469, 144.24023, 105.75303, 348.55685, 423.04358, 985.1682, 1148.849]
2025-09-12 17:00:04,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [494.0, 407.0, 870.0, 110.0, 71.0, 63.0, 145.0, 160.0, 335.0, 389.0]
2025-09-12 17:00:04,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 8 minutes, 8 seconds)
2025-09-12 17:12:53,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:12:53,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:15:54,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1866.76013 ± 1071.477
2025-09-12 17:15:54,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [790.441, 453.44876, 936.1228, 2969.286, 2197.2705, 2933.6475, 2974.7625, 2003.0653, 3064.349, 345.20795]
2025-09-12 17:15:54,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [278.0, 176.0, 325.0, 1000.0, 671.0, 1000.0, 982.0, 677.0, 1000.0, 143.0]
2025-09-12 17:15:54,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1866.76) for latency MM1Queue_a033_s075
2025-09-12 17:15:54,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 1 minute, 45 seconds)
2025-09-12 17:27:53,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:27:53,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:29:55,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1288.22095 ± 837.925
2025-09-12 17:29:55,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2125.2961, 155.90826, 905.60583, 1350.2542, 931.8858, 772.888, 1899.7842, 152.42099, 2929.6948, 1658.4713]
2025-09-12 17:29:55,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [661.0, 79.0, 318.0, 462.0, 327.0, 241.0, 619.0, 76.0, 915.0, 528.0]
2025-09-12 17:29:55,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 47 minutes, 7 seconds)
2025-09-12 17:41:43,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:41:43,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:43:58,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1443.41846 ± 1029.595
2025-09-12 17:43:58,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [934.92505, 1241.9176, 486.23914, 3156.9004, 199.76709, 1568.7042, 2909.176, 1467.2958, 83.386955, 2385.872]
2025-09-12 17:43:58,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [325.0, 412.0, 184.0, 1000.0, 91.0, 499.0, 938.0, 493.0, 49.0, 725.0]
2025-09-12 17:43:58,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 32 minutes, 4 seconds)
2025-09-12 17:56:03,785 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:56:03,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:58:27,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1469.17090 ± 786.032
2025-09-12 17:58:27,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1368.3375, 1062.0083, 3056.456, 1456.0618, 171.90285, 558.87744, 2360.9397, 1858.1335, 1312.676, 1486.3169]
2025-09-12 17:58:27,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [478.0, 366.0, 1000.0, 492.0, 81.0, 203.0, 797.0, 629.0, 450.0, 508.0]
2025-09-12 17:58:27,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 14 minutes, 3 seconds)
2025-09-12 18:11:02,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:11:02,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:14:10,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1983.63062 ± 979.289
2025-09-12 18:14:10,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [292.20526, 2558.812, 687.7726, 3109.3157, 3058.4448, 1608.2986, 1067.5898, 2195.399, 3097.329, 2161.1387]
2025-09-12 18:14:10,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 833.0, 250.0, 1000.0, 1000.0, 541.0, 352.0, 725.0, 1000.0, 731.0]
2025-09-12 18:14:10,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1983.63) for latency MM1Queue_a033_s075
2025-09-12 18:14:10,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 11 minutes, 14 seconds)
2025-09-12 18:26:10,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:26:10,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:27:25,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 763.73206 ± 722.105
2025-09-12 18:27:25,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [921.54926, 166.29466, 111.039795, 992.3058, 1032.1694, 108.430275, 820.22614, 2644.8564, 613.8401, 226.60817]
2025-09-12 18:27:25,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [311.0, 78.0, 59.0, 335.0, 317.0, 57.0, 286.0, 840.0, 228.0, 102.0]
2025-09-12 18:27:25,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 46 minutes, 1 second)
2025-09-12 18:39:02,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:39:02,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:40:41,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1076.48022 ± 996.828
2025-09-12 18:40:41,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [95.212234, 1654.9487, 385.0305, 636.5627, 86.24048, 711.64435, 3277.5522, 2366.1272, 1063.016, 488.46848]
2025-09-12 18:40:41,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 485.0, 152.0, 238.0, 53.0, 252.0, 1000.0, 724.0, 368.0, 186.0]
2025-09-12 18:40:41,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 28 minutes, 54 seconds)
2025-09-12 18:52:29,637 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:52:29,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:53:59,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 973.15979 ± 721.382
2025-09-12 18:53:59,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1040.116, 1288.523, 241.15984, 1991.7953, 2068.4402, 92.66744, 261.48157, 90.785995, 1243.764, 1412.8657]
2025-09-12 18:53:59,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [322.0, 416.0, 105.0, 633.0, 614.0, 53.0, 116.0, 52.0, 397.0, 414.0]
2025-09-12 18:53:59,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 12 minutes, 3 seconds)
2025-09-12 19:06:28,345 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:06:28,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:08:07,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1003.79675 ± 966.343
2025-09-12 19:08:07,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [68.04145, 349.19583, 1221.0138, 2428.8516, 75.40638, 836.5894, 1197.6108, 713.2975, 3039.2534, 108.708115]
2025-09-12 19:08:07,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 147.0, 406.0, 801.0, 43.0, 300.0, 414.0, 256.0, 1000.0, 59.0]
2025-09-12 19:08:07,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 56 minutes, 50 seconds)
2025-09-12 19:19:16,963 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:19:16,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:20:56,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1006.50018 ± 697.978
2025-09-12 19:20:56,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1155.333, 452.86447, 1750.7584, 297.23062, 1274.8488, 2228.5984, 189.71785, 978.41125, 1647.9174, 89.3218]
2025-09-12 19:20:56,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [400.0, 180.0, 577.0, 126.0, 427.0, 719.0, 89.0, 340.0, 532.0, 54.0]
2025-09-12 19:20:56,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 33 minutes, 36 seconds)
2025-09-12 19:33:20,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:33:20,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:35:21,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1309.94653 ± 968.795
2025-09-12 19:35:21,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2418.3774, 112.95702, 2177.3635, 1192.7816, 1847.9167, 160.46053, 341.11206, 364.61075, 2865.5776, 1618.3075]
2025-09-12 19:35:21,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [737.0, 59.0, 690.0, 398.0, 604.0, 79.0, 143.0, 148.0, 908.0, 478.0]
2025-09-12 19:35:21,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 23 minutes, 49 seconds)
2025-09-12 19:47:00,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:47:00,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:49:51,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1856.80896 ± 734.821
2025-09-12 19:49:51,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2408.3044, 2005.951, 2305.1003, 2588.9233, 3068.645, 695.2532, 1780.4379, 757.89923, 1380.5964, 1576.9789]
2025-09-12 19:49:51,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [747.0, 641.0, 724.0, 849.0, 953.0, 248.0, 578.0, 263.0, 465.0, 515.0]
2025-09-12 19:49:51,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 13 minutes, 39 seconds)
2025-09-12 20:02:46,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:02:46,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:04:18,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 965.18640 ± 830.458
2025-09-12 20:04:18,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [506.04694, 438.18958, 1570.9745, 927.4311, 416.91846, 950.04834, 3173.3694, 678.23016, 83.6906, 906.9644]
2025-09-12 20:04:18,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 165.0, 518.0, 320.0, 164.0, 326.0, 1000.0, 213.0, 50.0, 277.0]
2025-09-12 20:04:18,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 2 minutes, 49 seconds)
2025-09-12 20:16:09,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:16:09,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:17:51,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1077.58618 ± 1076.386
2025-09-12 20:17:51,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [860.2924, 62.569553, 54.7968, 2568.2188, 222.7561, 1978.0924, 1144.8469, 367.94382, 3218.3694, 297.97552]
2025-09-12 20:17:51,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [287.0, 42.0, 33.0, 823.0, 101.0, 638.0, 355.0, 139.0, 1000.0, 129.0]
2025-09-12 20:17:51,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 47 minutes, 22 seconds)
2025-09-12 20:28:54,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:28:54,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:30:35,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1051.23145 ± 902.615
2025-09-12 20:30:35,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [519.25574, 3031.4463, 791.2965, 92.78811, 1148.4792, 499.29626, 1773.743, 2023.8999, 532.3153, 99.7937]
2025-09-12 20:30:35,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 953.0, 269.0, 55.0, 396.0, 188.0, 594.0, 649.0, 192.0, 54.0]
2025-09-12 20:30:35,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 33 minutes, 15 seconds)
2025-09-12 20:42:57,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:42:57,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:44:02,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 648.39807 ± 608.365
2025-09-12 20:44:02,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [71.45935, 278.13904, 1852.35, 1296.7456, 344.60617, 1057.5565, 137.27231, 1192.4479, 127.92836, 125.475624]
2025-09-12 20:44:02,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 124.0, 604.0, 438.0, 143.0, 317.0, 70.0, 411.0, 67.0, 64.0]
2025-09-12 20:44:02,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 17 minutes, 21 seconds)
2025-09-12 20:56:17,222 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:56:17,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:57:58,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1052.07202 ± 825.714
2025-09-12 20:57:58,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [3171.1582, 684.99164, 835.01105, 1017.28455, 1039.1353, 1714.3497, 673.65393, 78.84597, 267.78937, 1038.5013]
2025-09-12 20:57:58,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 242.0, 289.0, 341.0, 359.0, 547.0, 232.0, 46.0, 115.0, 354.0]
2025-09-12 20:57:58,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 2 minutes, 36 seconds)
2025-09-12 21:09:39,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:09:39,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:12:31,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1858.30237 ± 913.126
2025-09-12 21:12:31,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1579.6455, 2662.614, 1774.6523, 3140.2378, 2262.7585, 540.3413, 3088.6492, 691.7027, 2055.8945, 786.5265]
2025-09-12 21:12:31,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [516.0, 839.0, 605.0, 1000.0, 747.0, 202.0, 1000.0, 246.0, 657.0, 276.0]
2025-09-12 21:12:31,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 49 minutes, 8 seconds)
2025-09-12 21:24:33,600 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:24:33,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:25:32,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 570.84167 ± 571.267
2025-09-12 21:25:32,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [461.86255, 707.9198, 672.16656, 2152.6965, 103.52733, 476.39893, 74.38987, 89.87885, 498.27185, 471.30466]
2025-09-12 21:25:32,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 259.0, 243.0, 679.0, 55.0, 181.0, 47.0, 52.0, 189.0, 179.0]
2025-09-12 21:25:32,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 34 minutes, 45 seconds)
2025-09-12 21:37:50,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:37:50,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:40:17,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1598.05347 ± 872.358
2025-09-12 21:40:17,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [3156.9512, 1675.9133, 364.92218, 1657.8815, 1679.1019, 2407.7861, 396.06122, 2414.3442, 662.51855, 1565.0546]
2025-09-12 21:40:17,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 501.0, 149.0, 539.0, 535.0, 770.0, 159.0, 795.0, 234.0, 499.0]
2025-09-12 21:40:17,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 23 minutes, 37 seconds)
2025-09-12 21:51:26,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:51:26,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:53:11,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1093.28003 ± 895.178
2025-09-12 21:53:11,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [871.4234, 218.00316, 1202.2483, 89.85305, 2257.0571, 387.90842, 3125.8914, 925.5599, 689.96173, 1164.8933]
2025-09-12 21:53:11,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [305.0, 98.0, 400.0, 53.0, 748.0, 150.0, 1000.0, 318.0, 240.0, 357.0]
2025-09-12 21:53:11,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 9 minutes, 8 seconds)
2025-09-12 22:04:44,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:04:44,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:06:05,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 823.12732 ± 578.431
2025-09-12 22:06:05,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [212.63417, 236.60854, 411.207, 770.7591, 275.54852, 1794.3871, 800.84125, 627.6794, 1310.1996, 1791.4087]
2025-09-12 22:06:05,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 107.0, 158.0, 259.0, 116.0, 567.0, 275.0, 222.0, 396.0, 598.0]
2025-09-12 22:06:05,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 54 minutes, 29 seconds)
2025-09-12 22:18:39,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:18:39,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:22:07,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 2250.64502 ± 901.672
2025-09-12 22:22:07,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2032.0614, 2223.738, 3072.6648, 749.924, 728.2535, 1609.4364, 3075.4265, 3120.9695, 3040.2607, 2853.7144]
2025-09-12 22:22:07,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [644.0, 722.0, 1000.0, 269.0, 262.0, 529.0, 1000.0, 1000.0, 1000.0, 925.0]
2025-09-12 22:22:07,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (2250.65) for latency MM1Queue_a033_s075
2025-09-12 22:22:07,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 41 minutes, 45 seconds)
2025-09-12 22:33:19,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:33:19,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:35:05,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1103.81421 ± 1008.374
2025-09-12 22:35:05,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [326.775, 369.56787, 304.2831, 478.68536, 958.0817, 285.48053, 1524.5227, 836.3574, 2841.7405, 3112.648]
2025-09-12 22:35:05,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 148.0, 130.0, 180.0, 323.0, 122.0, 455.0, 287.0, 924.0, 1000.0]
2025-09-12 22:35:05,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 27 minutes, 49 seconds)
2025-09-12 22:47:34,542 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:47:34,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:49:39,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1326.86987 ± 1024.358
2025-09-12 22:49:39,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [3201.3118, 130.37636, 256.94092, 1196.5887, 527.473, 126.3625, 1838.5428, 1367.4518, 2085.717, 2537.935]
2025-09-12 22:49:39,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 68.0, 111.0, 400.0, 201.0, 66.0, 590.0, 472.0, 682.0, 789.0]
2025-09-12 22:49:39,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 52 seconds)
2025-09-12 23:01:51,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:01:51,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:03:29,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1034.83716 ± 747.059
2025-09-12 23:03:29,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1532.3783, 569.4162, 47.03992, 131.44495, 35.64336, 2240.9458, 1820.0234, 1368.5339, 1271.453, 1331.4918]
2025-09-12 23:03:29,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [501.0, 207.0, 34.0, 66.0, 29.0, 731.0, 595.0, 412.0, 423.0, 446.0]
2025-09-12 23:03:29,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1251 [DEBUG]: Training session finished
