2025-09-11 23:59:03,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:59:03,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:59:03,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x150812d3f490>}
2025-09-11 23:59:03,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1111 [DEBUG]: using device: cuda
2025-09-11 23:59:03,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1133 [INFO]: Creating new trainer
2025-09-11 23:59:03,403 baseline-mbpac-noiseperc0-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-11 23:59:03,403 baseline-mbpac-noiseperc0-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 23:59:03,411 baseline-mbpac-noiseperc0-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 23:59:04,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1194 [DEBUG]: Starting training session...
2025-09-11 23:59:04,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 00:09:33,017 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:09:33,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:09:53,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 114.67202 ± 32.496
2025-09-12 00:09:53,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [113.6754, 69.59769, 127.83514, 148.0239, 130.91022, 138.00005, 131.38898, 45.03111, 145.83455, 96.42311]
2025-09-12 00:09:53,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 40.0, 67.0, 75.0, 69.0, 70.0, 69.0, 27.0, 75.0, 52.0]
2025-09-12 00:09:53,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (114.67) for latency MM1Queue_a033_s075
2025-09-12 00:09:53,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 50 minutes, 53 seconds)
2025-09-12 00:21:57,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:21:57,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:22:42,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 232.98441 ± 155.388
2025-09-12 00:22:42,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [45.36707, 235.57101, 457.31778, 474.31885, 197.71051, 429.1304, 181.2021, 145.69505, 50.72492, 112.80639]
2025-09-12 00:22:42,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 177.0, 340.0, 337.0, 144.0, 244.0, 128.0, 108.0, 41.0, 85.0]
2025-09-12 00:22:42,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (232.98) for latency MM1Queue_a033_s075
2025-09-12 00:22:42,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 18 minutes, 18 seconds)
2025-09-12 00:34:18,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:34:18,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:34:42,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 129.16805 ± 61.875
2025-09-12 00:34:42,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [151.09613, 146.4727, 65.34663, 91.723206, 232.2447, 167.97762, 76.172516, 186.65747, 158.50502, 15.484488]
2025-09-12 00:34:42,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 79.0, 45.0, 54.0, 116.0, 93.0, 47.0, 108.0, 93.0, 13.0]
2025-09-12 00:34:42,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 12 minutes, 23 seconds)
2025-09-12 00:46:31,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:46:31,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:46:52,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 161.15073 ± 73.331
2025-09-12 00:46:52,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [88.72948, 238.47116, 113.49823, 241.41055, 39.00604, 227.90886, 240.74844, 207.71104, 113.988266, 100.035126]
2025-09-12 00:46:52,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 109.0, 63.0, 109.0, 26.0, 105.0, 118.0, 102.0, 65.0, 56.0]
2025-09-12 00:46:52,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 7 minutes, 23 seconds)
2025-09-12 00:58:46,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:58:46,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:59:11,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 204.97122 ± 109.005
2025-09-12 00:59:11,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [260.31073, 15.414595, 211.13342, 294.08215, 277.26385, 84.18871, 282.3393, 287.3846, 33.44991, 304.14487]
2025-09-12 00:59:11,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 14.0, 98.0, 123.0, 121.0, 53.0, 124.0, 123.0, 24.0, 125.0]
2025-09-12 00:59:11,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 2 minutes, 7 seconds)
2025-09-12 01:10:54,403 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:10:54,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:11:34,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 255.94986 ± 47.466
2025-09-12 01:11:34,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [190.60635, 226.90373, 273.68448, 298.97314, 344.35803, 268.82605, 180.14415, 252.23102, 235.0648, 288.7069]
2025-09-12 01:11:34,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 116.0, 121.0, 143.0, 162.0, 127.0, 92.0, 108.0, 102.0, 134.0]
2025-09-12 01:11:34,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (255.95) for latency MM1Queue_a033_s075
2025-09-12 01:11:34,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 19 minutes, 42 seconds)
2025-09-12 01:23:17,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:23:17,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:24:24,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 380.18835 ± 302.249
2025-09-12 01:24:24,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [298.5708, 1073.3901, 858.1696, 215.96829, 197.4291, 189.32631, 345.3182, 146.64165, 289.51724, 187.55235]
2025-09-12 01:24:24,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 626.0, 515.0, 161.0, 176.0, 113.0, 277.0, 125.0, 175.0, 118.0]
2025-09-12 01:24:24,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (380.19) for latency MM1Queue_a033_s075
2025-09-12 01:24:24,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 7 minutes, 39 seconds)
2025-09-12 01:36:15,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:36:15,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:37:16,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 637.73975 ± 184.951
2025-09-12 01:37:16,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [790.45294, 414.29175, 219.44862, 724.03687, 754.6402, 593.38715, 807.06537, 781.16, 749.84875, 543.0662]
2025-09-12 01:37:16,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [293.0, 177.0, 111.0, 252.0, 246.0, 208.0, 261.0, 258.0, 236.0, 184.0]
2025-09-12 01:37:16,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (637.74) for latency MM1Queue_a033_s075
2025-09-12 01:37:16,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 11 minutes, 6 seconds)
2025-09-12 01:48:58,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:48:58,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:49:40,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 434.06738 ± 172.834
2025-09-12 01:49:40,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [564.7006, 185.81232, 539.8133, 530.6973, 523.2955, 615.5732, 221.93974, 163.58252, 370.43805, 624.8211]
2025-09-12 01:49:40,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 89.0, 180.0, 177.0, 177.0, 201.0, 99.0, 80.0, 144.0, 203.0]
2025-09-12 01:49:40,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 2 minutes, 46 seconds)
2025-09-12 02:01:45,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:01:45,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:03:42,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 916.66003 ± 709.097
2025-09-12 02:03:42,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [668.307, 1193.0609, 413.2118, 46.338463, 2589.7043, 1078.5833, 766.04095, 182.97923, 658.7274, 1569.6475]
2025-09-12 02:03:42,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [274.0, 479.0, 194.0, 36.0, 909.0, 356.0, 315.0, 91.0, 239.0, 618.0]
2025-09-12 02:03:42,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (916.66) for latency MM1Queue_a033_s075
2025-09-12 02:03:42,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 21 minutes, 19 seconds)
2025-09-12 02:14:35,983 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:14:35,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:16:21,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 907.56610 ± 540.044
2025-09-12 02:16:21,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [282.132, 2419.9956, 703.7695, 780.6828, 1003.86884, 648.5815, 1005.65607, 812.73926, 652.05707, 766.1783]
2025-09-12 02:16:21,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 878.0, 255.0, 271.0, 350.0, 225.0, 348.0, 322.0, 219.0, 281.0]
2025-09-12 02:16:21,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 13 minutes, 10 seconds)
2025-09-12 02:28:11,668 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:28:11,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:29:49,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1011.11633 ± 557.271
2025-09-12 02:29:49,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1438.5748, 949.89154, 510.2496, 144.4025, 1229.7726, 1881.684, 1171.7385, 526.9525, 487.3501, 1770.5482]
2025-09-12 02:29:49,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [495.0, 320.0, 170.0, 73.0, 446.0, 650.0, 454.0, 179.0, 181.0, 640.0]
2025-09-12 02:29:49,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1011.12) for latency MM1Queue_a033_s075
2025-09-12 02:29:49,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 11 minutes, 8 seconds)
2025-09-12 02:41:08,484 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:41:08,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:42:15,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 526.94812 ± 417.861
2025-09-12 02:42:15,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [790.235, 100.97185, 105.10681, 691.5628, 144.72803, 669.4635, 1394.8136, 29.135324, 872.8801, 470.58405]
2025-09-12 02:42:15,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [286.0, 58.0, 58.0, 276.0, 74.0, 245.0, 508.0, 32.0, 303.0, 198.0]
2025-09-12 02:42:15,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 50 minutes, 40 seconds)
2025-09-12 02:54:04,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:54:04,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:56:33,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1207.27759 ± 898.312
2025-09-12 02:56:33,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2693.678, 110.46812, 1153.4214, 1902.3997, 377.25757, 1009.26074, 633.4672, 126.008575, 2571.7593, 1495.0546]
2025-09-12 02:56:33,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 60.0, 445.0, 652.0, 165.0, 401.0, 260.0, 69.0, 932.0, 537.0]
2025-09-12 02:56:33,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1207.28) for latency MM1Queue_a033_s075
2025-09-12 02:56:33,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 10 minutes, 21 seconds)
2025-09-12 03:08:07,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:08:07,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:10:59,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1450.33459 ± 1103.840
2025-09-12 03:10:59,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [599.65173, 2041.7513, 1422.7117, 3031.7275, 1199.695, 154.0838, 91.41757, 2714.7788, 327.43854, 2920.0886]
2025-09-12 03:10:59,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [258.0, 719.0, 491.0, 1000.0, 451.0, 82.0, 51.0, 1000.0, 174.0, 1000.0]
2025-09-12 03:10:59,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1450.33) for latency MM1Queue_a033_s075
2025-09-12 03:10:59,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 19 hours, 3 minutes, 52 seconds)
2025-09-12 03:22:29,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:22:29,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:24:24,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1024.90588 ± 946.620
2025-09-12 03:24:24,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [546.4391, 1878.5934, 3100.7317, 1195.8574, 222.53311, 147.56578, 171.42911, 1999.6632, 433.03253, 553.2142]
2025-09-12 03:24:24,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [198.0, 596.0, 1000.0, 394.0, 103.0, 76.0, 83.0, 653.0, 165.0, 213.0]
2025-09-12 03:24:24,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 19 hours, 3 minutes, 2 seconds)
2025-09-12 03:35:53,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:35:53,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:37:45,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1210.86719 ± 1059.602
2025-09-12 03:37:45,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2554.8318, 279.77908, 114.19176, 3038.0276, 733.3179, 2588.9146, 563.85364, 1428.2411, 646.0159, 161.49931]
2025-09-12 03:37:45,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [852.0, 123.0, 61.0, 1000.0, 279.0, 865.0, 209.0, 491.0, 237.0, 80.0]
2025-09-12 03:37:45,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 47 minutes, 47 seconds)
2025-09-12 03:49:23,364 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:49:23,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:51:16,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1217.24841 ± 853.607
2025-09-12 03:51:16,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1880.5017, 471.60495, 295.42, 1887.5356, 256.98215, 1760.8619, 1807.9791, 38.412205, 1131.866, 2641.32]
2025-09-12 03:51:16,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [638.0, 180.0, 122.0, 641.0, 114.0, 593.0, 639.0, 32.0, 417.0, 876.0]
2025-09-12 03:51:16,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 51 minutes, 53 seconds)
2025-09-12 04:03:17,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:03:17,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:05:03,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1147.73584 ± 835.004
2025-09-12 04:05:03,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [985.09534, 3013.1064, 26.583376, 1016.18823, 1478.5503, 381.77036, 978.82574, 1630.5071, 215.18631, 1751.545]
2025-09-12 04:05:03,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [344.0, 1000.0, 24.0, 353.0, 506.0, 157.0, 343.0, 560.0, 99.0, 586.0]
2025-09-12 04:05:03,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 29 minutes, 53 seconds)
2025-09-12 04:16:03,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:16:03,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:17:45,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1069.17114 ± 743.896
2025-09-12 04:17:45,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [720.96295, 1979.2903, 279.84613, 1247.3291, 2415.441, 453.34546, 369.03336, 1237.6433, 217.22688, 1771.5933]
2025-09-12 04:17:45,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [262.0, 665.0, 116.0, 424.0, 820.0, 182.0, 156.0, 420.0, 99.0, 604.0]
2025-09-12 04:17:45,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 48 minutes, 19 seconds)
2025-09-12 04:29:48,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:29:48,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:31:31,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 887.47034 ± 849.087
2025-09-12 04:31:31,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [526.71387, 2444.668, 1019.5561, 57.298145, 719.57745, 2315.98, 86.37149, 90.98158, 263.70978, 1349.8469]
2025-09-12 04:31:31,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 820.0, 363.0, 35.0, 257.0, 755.0, 49.0, 51.0, 117.0, 473.0]
2025-09-12 04:31:31,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 40 minutes, 34 seconds)
2025-09-12 04:42:59,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:42:59,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:45:22,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1559.02759 ± 995.268
2025-09-12 04:45:22,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [651.9129, 1801.4017, 273.53772, 3007.318, 451.05325, 1344.5259, 3106.433, 1491.439, 853.27844, 2609.3752]
2025-09-12 04:45:22,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 592.0, 119.0, 1000.0, 175.0, 464.0, 1000.0, 470.0, 314.0, 889.0]
2025-09-12 04:45:22,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1559.03) for latency MM1Queue_a033_s075
2025-09-12 04:45:22,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 17 hours, 34 minutes, 46 seconds)
2025-09-12 04:56:59,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:56:59,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:59:53,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1909.39233 ± 1290.421
2025-09-12 04:59:53,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3054.957, 2550.6309, 184.49002, 95.02319, 2947.1257, 3049.8335, 146.38623, 1044.2347, 3038.2607, 2982.982]
2025-09-12 04:59:53,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 816.0, 88.0, 55.0, 1000.0, 1000.0, 74.0, 374.0, 1000.0, 1000.0]
2025-09-12 04:59:53,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1909.39) for latency MM1Queue_a033_s075
2025-09-12 04:59:53,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 17 hours, 36 minutes, 35 seconds)
2025-09-12 05:11:15,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:11:15,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:13:26,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1511.95972 ± 796.804
2025-09-12 05:13:26,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1660.8066, 1413.2926, 917.5508, 504.1866, 961.55756, 1667.0902, 923.5482, 3220.258, 1213.4961, 2637.811]
2025-09-12 05:13:26,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [542.0, 442.0, 314.0, 204.0, 320.0, 532.0, 314.0, 1000.0, 383.0, 836.0]
2025-09-12 05:13:26,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 19 minutes, 22 seconds)
2025-09-12 05:25:12,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:25:12,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:26:27,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 828.73987 ± 459.070
2025-09-12 05:26:27,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1287.0846, 1265.735, 937.6318, 176.41739, 1523.349, 759.5254, 1208.9434, 281.24585, 492.23328, 355.23288]
2025-09-12 05:26:27,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [403.0, 417.0, 329.0, 85.0, 472.0, 246.0, 385.0, 121.0, 186.0, 147.0]
2025-09-12 05:26:27,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 10 minutes, 24 seconds)
2025-09-12 05:38:19,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:38:19,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:40:14,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1061.37476 ± 692.502
2025-09-12 05:40:14,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [965.0716, 191.98616, 1326.5015, 1437.0714, 1036.3213, 2689.0684, 189.79034, 990.07526, 1311.1862, 476.6744]
2025-09-12 05:40:14,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [328.0, 91.0, 416.0, 467.0, 332.0, 833.0, 89.0, 336.0, 411.0, 182.0]
2025-09-12 05:40:14,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 57 minutes, 3 seconds)
2025-09-12 05:51:19,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:51:19,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:52:39,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 865.55743 ± 994.566
2025-09-12 05:52:39,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [172.60527, 175.12161, 3097.7446, 543.4794, 571.84375, 1483.912, 90.847176, 2200.7827, 160.64928, 158.58784]
2025-09-12 05:52:39,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 88.0, 991.0, 223.0, 215.0, 466.0, 52.0, 708.0, 81.0, 80.0]
2025-09-12 05:52:39,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 22 minutes, 21 seconds)
2025-09-12 06:04:23,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:04:23,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:06:26,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1377.92200 ± 1109.353
2025-09-12 06:06:26,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2305.1274, 159.39352, 3141.2358, 13.371684, 1702.3114, 452.05804, 969.1157, 657.06604, 1183.9265, 3195.6133]
2025-09-12 06:06:26,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [748.0, 82.0, 1000.0, 14.0, 557.0, 188.0, 335.0, 243.0, 378.0, 1000.0]
2025-09-12 06:06:26,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 58 minutes, 32 seconds)
2025-09-12 06:17:36,727 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:17:36,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:19:20,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1142.88501 ± 837.921
2025-09-12 06:19:20,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3214.7598, 650.38696, 1453.5146, 487.52777, 496.63455, 1436.3324, 1158.345, 1539.891, 975.67346, 15.783896]
2025-09-12 06:19:20,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 239.0, 471.0, 187.0, 197.0, 471.0, 391.0, 501.0, 343.0, 17.0]
2025-09-12 06:19:20,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 35 minutes, 45 seconds)
2025-09-12 06:31:03,039 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:31:03,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:32:19,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 833.90076 ± 559.687
2025-09-12 06:32:19,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [864.44324, 355.18207, 341.73752, 151.50662, 105.16408, 907.079, 1150.7891, 1104.2759, 1844.9094, 1513.9213]
2025-09-12 06:32:19,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [303.0, 140.0, 134.0, 76.0, 59.0, 310.0, 383.0, 359.0, 587.0, 491.0]
2025-09-12 06:32:19,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 22 minutes, 8 seconds)
2025-09-12 06:44:22,509 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:44:22,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:46:35,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1528.20898 ± 646.680
2025-09-12 06:46:35,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1107.7854, 474.60968, 2688.6938, 1382.3729, 1247.9567, 2030.109, 1492.3108, 2452.1062, 947.7475, 1458.3981]
2025-09-12 06:46:35,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 184.0, 836.0, 457.0, 407.0, 660.0, 476.0, 803.0, 313.0, 476.0]
2025-09-12 06:46:35,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 15 hours, 15 minutes, 40 seconds)
2025-09-12 06:57:51,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:57:51,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:00:01,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1527.65796 ± 732.323
2025-09-12 07:00:01,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1492.234, 1126.3256, 2275.2786, 317.6019, 1453.5093, 694.7112, 3087.0205, 1648.0093, 1481.2544, 1700.6366]
2025-09-12 07:00:01,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [467.0, 330.0, 734.0, 133.0, 453.0, 249.0, 1000.0, 505.0, 468.0, 519.0]
2025-09-12 07:00:01,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 16 minutes, 15 seconds)
2025-09-12 07:11:59,452 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:11:59,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:14:52,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1975.55347 ± 1243.862
2025-09-12 07:14:52,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2956.8083, 3091.2483, 205.3567, 120.0228, 2655.0112, 3137.157, 170.42409, 2502.1462, 3119.78, 1797.5813]
2025-09-12 07:14:52,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [958.0, 1000.0, 97.0, 80.0, 863.0, 1000.0, 83.0, 780.0, 1000.0, 569.0]
2025-09-12 07:14:52,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1975.55) for latency MM1Queue_a033_s075
2025-09-12 07:14:52,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 16 minutes, 48 seconds)
2025-09-12 07:26:34,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:26:34,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:28:15,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1133.95044 ± 679.605
2025-09-12 07:28:15,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [112.7404, 2241.3032, 1268.6655, 981.7881, 911.4226, 216.18797, 1655.0, 2130.5952, 1072.3463, 749.4538]
2025-09-12 07:28:15,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 711.0, 420.0, 326.0, 315.0, 99.0, 501.0, 650.0, 367.0, 266.0]
2025-09-12 07:28:15,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 9 minutes, 41 seconds)
2025-09-12 07:39:48,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:39:48,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:41:35,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1196.61108 ± 931.620
2025-09-12 07:41:35,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1876.9795, 82.949524, 799.9696, 199.76485, 3176.5679, 1009.0786, 176.1372, 997.2955, 1791.702, 1855.6671]
2025-09-12 07:41:35,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [610.0, 50.0, 286.0, 93.0, 1000.0, 344.0, 85.0, 338.0, 590.0, 601.0]
2025-09-12 07:41:35,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 27 seconds)
2025-09-12 07:53:16,050 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:53:16,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:55:11,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1281.03540 ± 853.861
2025-09-12 07:55:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1243.4083, 2194.6523, 3078.651, 1011.48236, 144.22652, 910.67084, 1689.182, 963.01715, 1482.6133, 92.44977]
2025-09-12 07:55:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [411.0, 729.0, 1000.0, 352.0, 74.0, 314.0, 531.0, 331.0, 493.0, 51.0]
2025-09-12 07:55:11,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 37 minutes, 53 seconds)
2025-09-12 08:06:29,583 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:06:29,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:08:31,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1089.84277 ± 911.047
2025-09-12 08:08:31,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [836.3003, 233.17642, 562.8499, 268.92514, 1847.2056, 663.9305, 2088.086, 739.40857, 490.90378, 3167.6416]
2025-09-12 08:08:31,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [295.0, 106.0, 215.0, 118.0, 582.0, 242.0, 675.0, 265.0, 193.0, 1000.0]
2025-09-12 08:08:31,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 22 minutes, 58 seconds)
2025-09-12 08:20:07,017 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:20:07,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:22:44,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1443.55347 ± 959.289
2025-09-12 08:22:44,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [162.65425, 2819.6804, 990.7026, 2316.892, 1363.0719, 3163.2534, 1166.9825, 349.8028, 1330.2512, 772.2428]
2025-09-12 08:22:44,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 893.0, 340.0, 765.0, 456.0, 1000.0, 395.0, 145.0, 440.0, 277.0]
2025-09-12 08:22:44,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 1 minute, 39 seconds)
2025-09-12 08:34:40,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:34:40,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:37:08,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1688.44995 ± 1165.691
2025-09-12 08:37:08,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3202.9753, 157.33025, 535.9905, 3080.0813, 705.17615, 2339.9219, 237.48138, 1930.5228, 1553.2881, 3141.7305]
2025-09-12 08:37:08,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 77.0, 205.0, 1000.0, 255.0, 754.0, 106.0, 597.0, 513.0, 1000.0]
2025-09-12 08:37:08,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 21 seconds)
2025-09-12 08:48:39,760 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:48:39,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:50:35,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1317.20544 ± 1130.377
2025-09-12 08:50:35,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [154.77599, 2696.8608, 1048.5723, 3168.0515, 1113.1896, 2518.8271, 153.1456, 2009.9182, 164.1365, 144.57713]
2025-09-12 08:50:35,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 841.0, 354.0, 1000.0, 369.0, 772.0, 76.0, 630.0, 80.0, 72.0]
2025-09-12 08:50:35,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 48 minutes, 7 seconds)
2025-09-12 09:02:06,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:02:06,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:03:56,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1302.23792 ± 745.532
2025-09-12 09:03:56,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2392.6409, 445.3967, 925.3891, 1602.3739, 1900.9114, 854.98737, 2544.2456, 1237.5887, 247.83035, 871.0149]
2025-09-12 09:03:56,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [742.0, 167.0, 300.0, 486.0, 572.0, 288.0, 782.0, 373.0, 108.0, 287.0]
2025-09-12 09:03:56,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 31 minutes, 20 seconds)
2025-09-12 09:14:49,982 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:14:49,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:16:41,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1262.59143 ± 1063.438
2025-09-12 09:16:41,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [157.54173, 629.94147, 1906.18, 2879.3208, 2596.473, 2602.684, 243.39053, 324.5802, 196.89305, 1088.9089]
2025-09-12 09:16:41,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 213.0, 620.0, 901.0, 796.0, 827.0, 105.0, 133.0, 92.0, 361.0]
2025-09-12 09:16:41,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 10 minutes, 41 seconds)
2025-09-12 09:28:14,427 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:28:14,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:30:37,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1625.20386 ± 1296.564
2025-09-12 09:30:37,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2643.6648, 199.25296, 117.94966, 2162.8262, 776.82764, 680.7296, 117.531555, 3162.3103, 3183.282, 3207.6638]
2025-09-12 09:30:37,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [818.0, 94.0, 63.0, 684.0, 269.0, 245.0, 64.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:30:37,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 53 minutes, 46 seconds)
2025-09-12 09:42:12,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:42:12,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:43:14,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 655.35577 ± 457.385
2025-09-12 09:43:14,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [264.16238, 1579.7118, 933.0533, 946.3603, 1061.6078, 315.68802, 374.15656, 119.415276, 152.35109, 807.05133]
2025-09-12 09:43:14,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 517.0, 329.0, 323.0, 364.0, 128.0, 153.0, 65.0, 75.0, 282.0]
2025-09-12 09:43:14,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 20 minutes, 25 seconds)
2025-09-12 09:55:01,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:55:01,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:58:10,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1730.08813 ± 1053.286
2025-09-12 09:58:10,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3169.7766, 483.583, 1928.3927, 1937.839, 3070.3367, 3126.8916, 1403.182, 1003.0358, 142.91942, 1034.9255]
2025-09-12 09:58:10,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 175.0, 638.0, 661.0, 1000.0, 1000.0, 461.0, 351.0, 71.0, 345.0]
2025-09-12 09:58:10,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 23 minutes, 19 seconds)
2025-09-12 10:10:19,226 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:10:19,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:13:30,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1791.71814 ± 1011.504
2025-09-12 10:13:30,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3176.481, 2618.902, 1100.832, 1012.53094, 949.68884, 3166.6738, 94.95539, 1155.6344, 2094.5867, 2546.895]
2025-09-12 10:13:30,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 852.0, 362.0, 336.0, 325.0, 1000.0, 54.0, 388.0, 664.0, 812.0]
2025-09-12 10:13:30,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 31 minutes, 23 seconds)
2025-09-12 10:24:26,158 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:24:26,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:26:14,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1207.56921 ± 1176.849
2025-09-12 10:26:14,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3172.8125, 1937.4912, 102.66993, 149.03906, 549.1991, 1951.1448, 292.26096, 285.33594, 458.84686, 3176.8928]
2025-09-12 10:26:14,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 613.0, 58.0, 74.0, 203.0, 591.0, 124.0, 120.0, 177.0, 1000.0]
2025-09-12 10:26:14,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 17 minutes, 15 seconds)
2025-09-12 10:37:50,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:37:50,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:39:51,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1369.94861 ± 1082.561
2025-09-12 10:39:51,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [988.96936, 3247.3965, 2128.995, 370.0214, 261.78348, 989.4127, 1346.3384, 90.31287, 3206.3904, 1069.8665]
2025-09-12 10:39:51,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [329.0, 1000.0, 675.0, 153.0, 114.0, 339.0, 433.0, 52.0, 1000.0, 363.0]
2025-09-12 10:39:51,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 6 seconds)
2025-09-12 10:51:22,828 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:51:22,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:52:51,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 964.91565 ± 1162.679
2025-09-12 10:52:51,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [167.76346, 209.67863, 409.51358, 223.83807, 778.53345, 3163.9717, 1352.2037, 102.73646, 66.21917, 3174.6973]
2025-09-12 10:52:51,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 96.0, 162.0, 100.0, 280.0, 1000.0, 449.0, 58.0, 42.0, 1000.0]
2025-09-12 10:52:51,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 49 minutes, 57 seconds)
2025-09-12 11:04:26,517 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:04:26,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:06:21,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1304.76782 ± 1095.590
2025-09-12 11:06:21,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1534.1486, 3194.131, 147.0499, 866.37683, 3180.861, 666.77454, 1854.0085, 161.68556, 1283.6716, 158.96956]
2025-09-12 11:06:21,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [494.0, 1000.0, 73.0, 277.0, 1000.0, 238.0, 584.0, 79.0, 406.0, 79.0]
2025-09-12 11:06:21,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 21 minutes, 46 seconds)
2025-09-12 11:17:38,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:17:38,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:19:40,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1340.57019 ± 1062.990
2025-09-12 11:19:40,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3185.2485, 693.8148, 890.1212, 3197.4836, 868.36273, 120.89717, 1691.8557, 1905.0938, 385.9835, 466.84106]
2025-09-12 11:19:40,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 255.0, 302.0, 1000.0, 273.0, 62.0, 529.0, 604.0, 158.0, 175.0]
2025-09-12 11:19:40,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 48 minutes, 24 seconds)
2025-09-12 11:31:59,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:31:59,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:34:33,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1398.43945 ± 1046.280
2025-09-12 11:34:33,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [984.8386, 1684.451, 769.7491, 226.22574, 907.9651, 67.83257, 3121.847, 3149.804, 2183.3586, 888.3236]
2025-09-12 11:34:33,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [328.0, 551.0, 268.0, 100.0, 300.0, 41.0, 978.0, 1000.0, 657.0, 317.0]
2025-09-12 11:34:33,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 55 minutes, 50 seconds)
2025-09-12 11:47:29,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:47:29,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:49:26,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1073.76611 ± 878.066
2025-09-12 11:49:26,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [715.2722, 203.24612, 2207.1365, 1208.9795, 79.56839, 696.4892, 1256.2883, 3021.1357, 1025.5879, 323.95798]
2025-09-12 11:49:26,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [250.0, 92.0, 666.0, 391.0, 48.0, 252.0, 407.0, 902.0, 345.0, 135.0]
2025-09-12 11:49:26,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 54 minutes, 6 seconds)
2025-09-12 12:00:38,346 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:00:38,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:02:29,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1230.65063 ± 752.754
2025-09-12 12:02:29,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [794.5144, 1179.4417, 2216.4458, 2152.525, 147.9662, 1250.8391, 1815.806, 422.14725, 2021.9438, 304.87695]
2025-09-12 12:02:29,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 385.0, 704.0, 686.0, 73.0, 404.0, 554.0, 166.0, 644.0, 131.0]
2025-09-12 12:02:29,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 40 minutes, 39 seconds)
2025-09-12 12:15:04,095 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:15:04,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:18:34,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1986.31323 ± 1197.659
2025-09-12 12:18:34,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1956.3114, 3299.2607, 2154.9617, 3234.631, 3155.4927, 185.45618, 633.9417, 3300.0225, 1607.1357, 335.91724]
2025-09-12 12:18:34,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [613.0, 1000.0, 674.0, 1000.0, 955.0, 85.0, 223.0, 1000.0, 515.0, 136.0]
2025-09-12 12:18:34,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1986.31) for latency MM1Queue_a033_s075
2025-09-12 12:18:34,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 49 minutes, 58 seconds)
2025-09-12 12:30:16,706 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:30:16,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:32:55,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1821.55762 ± 860.366
2025-09-12 12:32:55,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2521.557, 976.8172, 1633.761, 2353.9194, 3239.8794, 1424.5311, 2682.0686, 198.93057, 1251.4495, 1932.6626]
2025-09-12 12:32:55,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [792.0, 325.0, 486.0, 719.0, 1000.0, 444.0, 851.0, 92.0, 415.0, 609.0]
2025-09-12 12:32:55,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 44 minutes, 32 seconds)
2025-09-12 12:44:51,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:44:51,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:48:11,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2318.08740 ± 929.245
2025-09-12 12:48:11,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2698.268, 3284.8245, 758.2386, 2527.391, 2060.8203, 3228.1558, 3039.6753, 870.52826, 1471.7748, 3241.1965]
2025-09-12 12:48:11,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [812.0, 1000.0, 258.0, 776.0, 642.0, 1000.0, 904.0, 291.0, 463.0, 1000.0]
2025-09-12 12:48:11,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2318.09) for latency MM1Queue_a033_s075
2025-09-12 12:48:11,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 33 minutes, 20 seconds)
2025-09-12 13:00:33,757 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:00:33,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:03:09,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1794.18811 ± 797.722
2025-09-12 13:03:09,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2012.3472, 2246.124, 2289.8755, 1196.9941, 1130.385, 1805.6005, 3255.413, 2261.3623, 1583.9126, 159.86787]
2025-09-12 13:03:09,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [603.0, 665.0, 701.0, 397.0, 357.0, 578.0, 1000.0, 666.0, 507.0, 77.0]
2025-09-12 13:03:09,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 19 minutes, 8 seconds)
2025-09-12 13:15:09,768 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:15:09,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:17:33,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1301.75183 ± 983.623
2025-09-12 13:17:33,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [174.96979, 3213.9507, 1710.7516, 96.993546, 1419.1072, 1484.8903, 2523.9246, 57.392178, 1217.4354, 1118.1025]
2025-09-12 13:17:33,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 1000.0, 555.0, 54.0, 464.0, 483.0, 783.0, 35.0, 403.0, 376.0]
2025-09-12 13:17:33,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 15 minutes, 30 seconds)
2025-09-12 13:29:01,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:29:01,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:31:08,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1170.24890 ± 930.672
2025-09-12 13:31:08,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1438.109, 685.7961, 3277.9517, 218.37582, 278.80197, 919.101, 234.32089, 1186.4246, 1165.2728, 2298.3362]
2025-09-12 13:31:08,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [477.0, 238.0, 1000.0, 98.0, 119.0, 310.0, 103.0, 380.0, 377.0, 694.0]
2025-09-12 13:31:08,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 40 minutes, 37 seconds)
2025-09-12 13:43:22,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:43:22,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:45:22,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1397.93530 ± 1267.524
2025-09-12 13:45:22,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1324.5295, 974.97455, 3209.4385, 1131.3326, 204.26839, 417.82013, 113.44526, 3265.936, 3227.2825, 110.325645]
2025-09-12 13:45:22,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [430.0, 338.0, 1000.0, 357.0, 92.0, 166.0, 62.0, 1000.0, 1000.0, 62.0]
2025-09-12 13:45:22,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 25 minutes, 7 seconds)
2025-09-12 13:56:47,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:47,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:00:08,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2375.04321 ± 929.770
2025-09-12 14:00:08,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3234.1387, 2095.2732, 1784.3563, 3224.5457, 3265.3843, 1761.8856, 1111.8903, 3230.997, 794.14526, 3247.814]
2025-09-12 14:00:08,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 647.0, 550.0, 1000.0, 1000.0, 545.0, 351.0, 1000.0, 277.0, 1000.0]
2025-09-12 14:00:08,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2375.04) for latency MM1Queue_a033_s075
2025-09-12 14:00:08,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 6 minutes, 44 seconds)
2025-09-12 14:11:44,225 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:11:44,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:12:57,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 820.62079 ± 657.272
2025-09-12 14:12:57,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [439.47556, 310.58978, 486.5819, 1783.377, 345.45413, 1711.8734, 123.50538, 216.31995, 978.4572, 1810.5739]
2025-09-12 14:12:57,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 127.0, 183.0, 552.0, 143.0, 528.0, 64.0, 100.0, 323.0, 542.0]
2025-09-12 14:12:57,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 36 minutes, 36 seconds)
2025-09-12 14:23:59,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:23:59,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:26:18,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1618.61633 ± 1139.941
2025-09-12 14:26:18,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [157.27672, 791.1722, 1970.9169, 3205.9583, 1304.2385, 3170.082, 426.86237, 298.57837, 2951.8875, 1909.1907]
2025-09-12 14:26:18,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 280.0, 632.0, 1000.0, 414.0, 1000.0, 168.0, 129.0, 922.0, 619.0]
2025-09-12 14:26:18,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 14 minutes, 58 seconds)
2025-09-12 14:38:07,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:38:07,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:40:19,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1571.33179 ± 993.079
2025-09-12 14:40:19,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [148.41031, 1472.6385, 162.55948, 3341.1252, 2668.428, 1215.9962, 2485.2537, 1044.3057, 1952.9507, 1221.6493]
2025-09-12 14:40:19,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 461.0, 79.0, 1000.0, 813.0, 383.0, 757.0, 346.0, 620.0, 379.0]
2025-09-12 14:40:19,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 4 minutes, 15 seconds)
2025-09-12 14:51:16,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:51:16,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:55:22,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2408.14966 ± 977.044
2025-09-12 14:55:22,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3258.0452, 1055.6552, 3179.924, 1455.3761, 3172.9443, 548.4401, 2518.9534, 3234.1, 3275.6743, 2382.3845]
2025-09-12 14:55:22,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 351.0, 1000.0, 473.0, 1000.0, 201.0, 782.0, 1000.0, 1000.0, 738.0]
2025-09-12 14:55:22,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2408.15) for latency MM1Queue_a033_s075
2025-09-12 14:55:22,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 55 minutes, 57 seconds)
2025-09-12 15:07:23,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:07:23,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:10:45,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1958.70093 ± 1072.535
2025-09-12 15:10:45,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2634.6245, 2228.2961, 1400.944, 3219.4463, 1282.3134, 473.66702, 3277.2202, 1537.0669, 3253.3845, 280.0477]
2025-09-12 15:10:45,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [817.0, 685.0, 447.0, 1000.0, 401.0, 182.0, 1000.0, 499.0, 1000.0, 121.0]
2025-09-12 15:10:45,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 46 minutes, 4 seconds)
2025-09-12 15:21:36,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:21:36,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:24:30,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1718.47388 ± 1226.288
2025-09-12 15:24:30,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [406.04962, 3326.1335, 1566.0354, 1865.0103, 504.35577, 3362.929, 2242.219, 3239.431, 113.5021, 559.07227]
2025-09-12 15:24:30,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 1000.0, 476.0, 566.0, 187.0, 1000.0, 676.0, 964.0, 60.0, 202.0]
2025-09-12 15:24:30,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 37 minutes, 50 seconds)
2025-09-12 15:36:37,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:36:37,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:39:48,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1825.45044 ± 1000.472
2025-09-12 15:39:48,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1217.6696, 3256.9514, 1809.87, 3174.2166, 828.39966, 1387.8684, 1518.5511, 1703.5378, 3167.6357, 189.80547]
2025-09-12 15:39:48,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [408.0, 1000.0, 565.0, 986.0, 287.0, 449.0, 488.0, 540.0, 971.0, 91.0]
2025-09-12 15:39:48,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 35 minutes, 44 seconds)
2025-09-12 15:50:35,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:50:36,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:53:35,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2103.25439 ± 1003.935
2025-09-12 15:53:35,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1603.042, 3270.7441, 3221.887, 2571.5369, 3214.225, 2539.6436, 1497.8389, 522.1748, 459.14297, 2132.3096]
2025-09-12 15:53:35,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [512.0, 1000.0, 1000.0, 792.0, 1000.0, 795.0, 487.0, 198.0, 176.0, 672.0]
2025-09-12 15:53:35,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 19 minutes, 36 seconds)
2025-09-12 16:05:09,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:05:09,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:08:28,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2364.46313 ± 1169.703
2025-09-12 16:08:28,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3256.358, 1677.3171, 3247.943, 3244.8838, 148.62805, 3254.85, 552.9172, 3238.625, 1781.1864, 3241.9216]
2025-09-12 16:08:28,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 527.0, 1000.0, 1000.0, 75.0, 1000.0, 204.0, 1000.0, 563.0, 1000.0]
2025-09-12 16:08:28,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 4 minutes)
2025-09-12 16:20:07,645 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:20:07,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:22:53,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1946.16638 ± 1294.368
2025-09-12 16:22:53,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [180.32588, 3259.6016, 126.975, 1748.9458, 3284.0728, 135.28758, 1757.3376, 2640.222, 3224.267, 3104.628]
2025-09-12 16:22:53,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 1000.0, 65.0, 549.0, 1000.0, 68.0, 558.0, 802.0, 1000.0, 1000.0]
2025-09-12 16:22:53,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 43 minutes, 58 seconds)
2025-09-12 16:34:25,082 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:25,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:37:11,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2008.27954 ± 1034.424
2025-09-12 16:37:11,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3405.004, 3295.3604, 1815.466, 998.91425, 1552.5426, 344.7134, 1214.3483, 2909.2961, 1413.6766, 3133.4749]
2025-09-12 16:37:11,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 552.0, 324.0, 482.0, 140.0, 372.0, 862.0, 467.0, 1000.0]
2025-09-12 16:37:11,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 32 minutes, 31 seconds)
2025-09-12 16:48:24,214 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:48:24,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:52:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2621.61108 ± 1266.615
2025-09-12 16:52:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3324.6926, 3174.487, 2960.3901, 96.072914, 3297.2354, 3322.4978, 3300.777, 3342.3264, 3298.5828, 99.048294]
2025-09-12 16:52:48,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [992.0, 1000.0, 908.0, 53.0, 995.0, 1000.0, 1000.0, 1000.0, 1000.0, 58.0]
2025-09-12 16:52:48,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2621.61) for latency MM1Queue_a033_s075
2025-09-12 16:52:48,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 19 minutes, 34 seconds)
2025-09-12 17:04:08,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:04:08,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:07:17,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1821.07202 ± 1057.816
2025-09-12 17:07:17,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [461.67432, 3250.4248, 2371.7402, 1839.0393, 1232.4401, 1601.2483, 3179.3342, 461.71762, 703.837, 3109.2644]
2025-09-12 17:07:17,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 1000.0, 733.0, 591.0, 390.0, 500.0, 1000.0, 174.0, 242.0, 980.0]
2025-09-12 17:07:17,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 8 minutes, 29 seconds)
2025-09-12 17:19:08,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:19:08,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:22:09,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2214.34399 ± 1267.062
2025-09-12 17:22:09,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1837.711, 157.14319, 3266.775, 3317.8345, 3334.5642, 3343.0205, 3247.2732, 80.49324, 2422.6086, 1136.0143]
2025-09-12 17:22:09,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [572.0, 76.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 47.0, 740.0, 371.0]
2025-09-12 17:22:09,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 53 minutes, 43 seconds)
2025-09-12 17:33:36,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:33:36,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:35:47,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1550.90869 ± 747.175
2025-09-12 17:35:47,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1379.1318, 1551.5846, 516.08185, 2084.7942, 2440.1233, 1729.7722, 845.9071, 1838.3359, 379.57626, 2743.7805]
2025-09-12 17:35:47,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [452.0, 476.0, 189.0, 630.0, 758.0, 556.0, 286.0, 588.0, 150.0, 837.0]
2025-09-12 17:35:47,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 35 minutes, 20 seconds)
2025-09-12 17:47:43,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:47:43,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:50:01,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1640.84900 ± 1089.117
2025-09-12 17:50:01,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2096.568, 695.8484, 346.09552, 1737.9346, 3239.6094, 1960.9492, 575.5149, 2247.8567, 3317.3403, 190.77399]
2025-09-12 17:50:01,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [618.0, 248.0, 141.0, 559.0, 1000.0, 602.0, 214.0, 663.0, 1000.0, 89.0]
2025-09-12 17:50:01,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 20 minutes, 26 seconds)
2025-09-12 18:01:12,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:01:12,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:03:28,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1580.18921 ± 1230.922
2025-09-12 18:03:28,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3266.8665, 354.311, 601.8571, 520.51984, 3270.313, 560.53796, 1040.2583, 2321.818, 3299.768, 565.64307]
2025-09-12 18:03:28,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 141.0, 208.0, 196.0, 1000.0, 199.0, 339.0, 718.0, 1000.0, 202.0]
2025-09-12 18:03:28,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 56 minutes, 49 seconds)
2025-09-12 18:14:56,921 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:14:56,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:18:09,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2336.10205 ± 1207.587
2025-09-12 18:18:09,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3343.3547, 3334.8618, 2316.8098, 1395.6937, 3347.5386, 2507.4072, 203.01575, 3294.169, 290.8517, 3327.3157]
2025-09-12 18:18:09,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 714.0, 441.0, 1000.0, 761.0, 95.0, 1000.0, 125.0, 1000.0]
2025-09-12 18:18:09,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 43 minutes, 25 seconds)
2025-09-12 18:29:50,717 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:29:50,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:34:07,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2586.92627 ± 1114.087
2025-09-12 18:34:07,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2030.661, 3352.389, 241.48466, 3337.1558, 3338.9644, 819.4137, 3337.4082, 3348.467, 3366.4653, 2696.8557]
2025-09-12 18:34:07,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [619.0, 1000.0, 105.0, 1000.0, 1000.0, 281.0, 1000.0, 1000.0, 1000.0, 813.0]
2025-09-12 18:34:07,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 33 minutes, 25 seconds)
2025-09-12 18:46:06,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:46:06,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:47:31,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 964.98242 ± 1189.060
2025-09-12 18:47:31,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [128.8775, 3325.5918, 268.75717, 987.0402, 3268.108, 194.20906, 232.26216, 494.98438, 496.35718, 253.63707]
2025-09-12 18:47:31,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 1000.0, 113.0, 331.0, 1000.0, 91.0, 102.0, 188.0, 182.0, 111.0]
2025-09-12 18:47:31,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 18 minutes, 14 seconds)
2025-09-12 18:58:18,644 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:58:18,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:01:03,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1966.71448 ± 1074.729
2025-09-12 19:01:03,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2031.7754, 909.8935, 3368.0837, 2303.8662, 1846.3629, 3218.629, 3319.9492, 1853.3578, 586.98956, 228.23616]
2025-09-12 19:01:03,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [630.0, 310.0, 1000.0, 700.0, 561.0, 1000.0, 1000.0, 584.0, 212.0, 101.0]
2025-09-12 19:01:03,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 1 minute, 32 seconds)
2025-09-12 19:12:25,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:12:25,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:14:20,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1350.64905 ± 1317.115
2025-09-12 19:14:20,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1217.2139, 3320.939, 711.55096, 243.35416, 535.7927, 495.6237, 179.79652, 3293.723, 181.6306, 3326.8647]
2025-09-12 19:14:20,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [394.0, 1000.0, 243.0, 110.0, 196.0, 186.0, 86.0, 986.0, 87.0, 1000.0]
2025-09-12 19:14:20,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 46 minutes, 47 seconds)
2025-09-12 19:26:17,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:26:17,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:29:36,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2371.48193 ± 1013.137
2025-09-12 19:29:36,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3221.801, 3250.0034, 1212.4047, 1476.8308, 3319.2693, 905.374, 2872.9219, 3202.2344, 1005.1975, 3248.7827]
2025-09-12 19:29:36,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 399.0, 474.0, 1000.0, 308.0, 889.0, 1000.0, 335.0, 1000.0]
2025-09-12 19:29:36,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 34 minutes, 22 seconds)
2025-09-12 19:41:42,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:41:42,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:45:50,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 3029.85400 ± 565.471
2025-09-12 19:45:50,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3318.384, 3332.1619, 3302.5486, 2025.176, 3305.3945, 3344.746, 1784.5482, 3323.96, 3301.6814, 3259.9397]
2025-09-12 19:45:50,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 638.0, 1000.0, 1000.0, 555.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:45:50,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (3029.85) for latency MM1Queue_a033_s075
2025-09-12 19:45:50,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 20 minutes, 48 seconds)
2025-09-12 19:56:38,549 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:56:38,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:59:02,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1387.74585 ± 1172.806
2025-09-12 19:59:02,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3346.5493, 539.2109, 2500.7192, 3355.0015, 1051.3492, 188.09486, 829.85254, 1377.1294, 187.04858, 502.50247]
2025-09-12 19:59:02,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 199.0, 761.0, 1000.0, 346.0, 89.0, 282.0, 435.0, 87.0, 188.0]
2025-09-12 19:59:02,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 5 minutes, 54 seconds)
2025-09-12 20:11:35,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:11:35,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:15:36,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2442.05518 ± 1240.263
2025-09-12 20:15:36,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [496.77356, 2163.8716, 314.90585, 3385.72, 3421.332, 3375.2224, 3387.7976, 3381.0864, 3347.7397, 1146.1011]
2025-09-12 20:15:36,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 651.0, 129.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 366.0]
2025-09-12 20:15:36,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 58 minutes, 54 seconds)
2025-09-12 20:26:55,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:26:55,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:30:51,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2387.52002 ± 1277.185
2025-09-12 20:30:51,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3426.8198, 142.11133, 3368.085, 3384.1702, 1697.7802, 3424.92, 3441.3767, 3283.9197, 1025.7162, 680.3009]
2025-09-12 20:30:51,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 70.0, 1000.0, 1000.0, 517.0, 1000.0, 1000.0, 1000.0, 335.0, 235.0]
2025-09-12 20:30:51,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 48 minutes, 19 seconds)
2025-09-12 20:42:06,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:42:06,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:45:36,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2479.34766 ± 1137.346
2025-09-12 20:45:36,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [116.39698, 3261.4832, 3227.0894, 3199.5903, 3137.0728, 738.50446, 3132.0234, 3193.7905, 1603.7539, 3183.773]
2025-09-12 20:45:36,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 1000.0, 1000.0, 1000.0, 1000.0, 254.0, 1000.0, 1000.0, 502.0, 1000.0]
2025-09-12 20:45:36,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 32 minutes)
2025-09-12 20:57:20,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:57:20,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:00:05,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2000.81934 ± 1327.430
2025-09-12 21:00:05,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3408.084, 3296.806, 844.1257, 710.1508, 2599.2703, 3230.7043, 2202.0884, 23.086979, 3441.7598, 252.11632]
2025-09-12 21:00:05,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 275.0, 244.0, 790.0, 1000.0, 677.0, 20.0, 1000.0, 109.0]
2025-09-12 21:00:05,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 13 minutes, 39 seconds)
2025-09-12 21:12:01,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:12:01,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:15:32,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2051.50098 ± 1109.872
2025-09-12 21:15:32,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1750.4625, 364.6802, 3099.3733, 2721.2024, 1145.0144, 525.62885, 3206.4766, 3216.97, 3222.4717, 1262.7286]
2025-09-12 21:15:32,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [531.0, 140.0, 963.0, 844.0, 364.0, 194.0, 1000.0, 1000.0, 1000.0, 409.0]
2025-09-12 21:15:32,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 2 minutes, 25 seconds)
2025-09-12 21:27:05,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:27:05,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:30:12,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2329.87402 ± 1185.099
2025-09-12 21:30:12,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3361.3794, 3094.8804, 2443.8074, 3362.4204, 2315.735, 590.29755, 3355.8257, 3398.1648, 94.26475, 1281.963]
2025-09-12 21:30:12,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 922.0, 736.0, 1000.0, 702.0, 210.0, 1000.0, 1000.0, 53.0, 412.0]
2025-09-12 21:30:12,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 44 minutes, 26 seconds)
2025-09-12 21:41:58,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:41:58,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:45:12,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2389.23047 ± 1217.275
2025-09-12 21:45:12,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3264.8794, 3278.5083, 3357.2383, 2118.1924, 3359.3367, 363.2043, 3293.8176, 3331.6482, 1083.8635, 441.61755]
2025-09-12 21:45:12,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [977.0, 1000.0, 1000.0, 639.0, 1000.0, 143.0, 981.0, 1000.0, 357.0, 165.0]
2025-09-12 21:45:12,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 29 minutes, 13 seconds)
2025-09-12 21:56:39,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:56:39,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:59:40,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2211.23535 ± 1111.024
2025-09-12 21:59:40,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2328.6611, 2070.9229, 1691.7335, 3412.8755, 2514.598, 3410.444, 3383.739, 427.7064, 149.7404, 2721.934]
2025-09-12 21:59:40,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [718.0, 632.0, 524.0, 1000.0, 749.0, 1000.0, 1000.0, 164.0, 73.0, 811.0]
2025-09-12 21:59:40,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 14 minutes, 3 seconds)
2025-09-12 22:11:07,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:11:07,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:14:53,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2247.99463 ± 1199.996
2025-09-12 22:14:53,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1897.4675, 1473.2759, 1962.7915, 3345.6584, 3306.8838, 3401.3564, 3228.0095, 3313.015, 95.5789, 455.9095]
2025-09-12 22:14:53,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [588.0, 461.0, 604.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 52.0, 171.0]
2025-09-12 22:14:53,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 59 minutes, 50 seconds)
2025-09-12 22:26:43,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:26:43,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:30:39,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2368.73511 ± 1380.445
2025-09-12 22:30:39,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3382.2966, 3413.8186, 707.1192, 34.241375, 3341.3699, 3360.7852, 130.1174, 2788.6523, 3305.2083, 3223.7417]
2025-09-12 22:30:39,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 239.0, 28.0, 1000.0, 1000.0, 68.0, 830.0, 1000.0, 1000.0]
2025-09-12 22:30:39,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 45 minutes, 4 seconds)
2025-09-12 22:42:02,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:42:02,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:44:58,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1726.14941 ± 1015.229
2025-09-12 22:44:58,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [222.52017, 3357.119, 372.8247, 2034.0941, 1418.7317, 3318.2026, 1178.8348, 1482.2461, 1501.8339, 2375.0862]
2025-09-12 22:44:58,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 1000.0, 150.0, 625.0, 445.0, 1000.0, 386.0, 472.0, 466.0, 733.0]
2025-09-12 22:44:58,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 29 minutes, 54 seconds)
2025-09-12 22:56:27,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:56:27,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:59:42,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2427.82544 ± 1086.456
2025-09-12 22:59:42,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3426.3757, 147.61235, 3272.0305, 2442.1912, 3323.6033, 3387.1377, 1440.2719, 3363.434, 1308.3209, 2167.2766]
2025-09-12 22:59:42,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 73.0, 977.0, 736.0, 1000.0, 1000.0, 453.0, 1000.0, 420.0, 657.0]
2025-09-12 22:59:42,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 54 seconds)
2025-09-12 23:11:04,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:11:04,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:14:33,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2088.16382 ± 1283.185
2025-09-12 23:14:33,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1276.465, 3030.1914, 187.77151, 1056.5472, 3350.4055, 3410.7375, 1700.4812, 3356.4333, 187.52667, 3325.0774]
2025-09-12 23:14:33,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [405.0, 896.0, 87.0, 344.0, 1000.0, 1000.0, 521.0, 1000.0, 88.0, 1000.0]
2025-09-12 23:14:33,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1251 [DEBUG]: Training session finished
