2025-09-11 23:42:56,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:42:56,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:42:56,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14b287d11250>}
2025-09-11 23:42:56,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 23:42:56,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 23:42:56,480 baseline-mbpac-noiseperc15-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 23:42:56,480 baseline-mbpac-noiseperc15-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 23:42:56,488 baseline-mbpac-noiseperc15-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 23:42:57,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 23:42:57,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 23:52:54,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:52:54,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:57:27,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -217.43086 ± 27.042
2025-09-11 23:57:27,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-233.02951, -211.23279, -218.27673, -171.6829, -188.65988, -249.70226, -218.74577, -267.47467, -221.43655, -194.06754]
2025-09-11 23:57:27,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:57:27,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (-217.43) for latency MM1Queue_a033_s075
2025-09-11 23:57:27,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 23 hours, 54 minutes, 45 seconds)
2025-09-12 00:08:30,853 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:08:30,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:13:03,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -216.78781 ± 54.540
2025-09-12 00:13:03,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-282.7202, -222.44641, -186.90987, -303.30792, -142.99417, -202.63406, -268.86176, -197.3986, -127.42038, -233.18486]
2025-09-12 00:13:03,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:13:03,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (-216.79) for latency MM1Queue_a033_s075
2025-09-12 00:13:03,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 24 hours, 35 minutes, 7 seconds)
2025-09-12 00:24:08,509 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:24:08,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:28:41,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 195.88867 ± 102.955
2025-09-12 00:28:41,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [113.059456, 103.02451, 308.4146, 120.6102, 207.12816, 389.62753, 261.88464, 59.284145, 273.74734, 122.10621]
2025-09-12 00:28:41,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:28:41,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (195.89) for latency MM1Queue_a033_s075
2025-09-12 00:28:41,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 24 hours, 38 minutes, 33 seconds)
2025-09-12 00:39:45,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:39:45,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:44:17,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 370.64273 ± 194.896
2025-09-12 00:44:17,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [121.344345, 445.10635, 218.96336, 625.5085, 522.6748, 233.13824, 526.2617, 17.707558, 475.33563, 520.3864]
2025-09-12 00:44:17,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:44:17,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (370.64) for latency MM1Queue_a033_s075
2025-09-12 00:44:17,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 24 hours, 31 minutes, 58 seconds)
2025-09-12 00:55:22,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:55:22,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:59:51,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 811.34741 ± 171.763
2025-09-12 00:59:51,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [834.6503, 815.24536, 877.0009, 632.81616, 867.5104, 953.2604, 934.1891, 370.9893, 864.1733, 963.6387]
2025-09-12 00:59:51,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:59:51,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (811.35) for latency MM1Queue_a033_s075
2025-09-12 00:59:51,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 24 hours, 21 minutes, 11 seconds)
2025-09-12 01:10:57,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:10:57,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:15:28,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1194.22156 ± 466.563
2025-09-12 01:15:28,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1463.7733, 860.0384, 1641.3184, 1651.7952, 1582.0813, 1802.6782, 407.59552, 625.8369, 904.2208, 1002.87805]
2025-09-12 01:15:28,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:15:28,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (1194.22) for latency MM1Queue_a033_s075
2025-09-12 01:15:28,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 26 minutes, 43 seconds)
2025-09-12 01:26:34,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:26:34,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:31:01,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1129.24365 ± 797.714
2025-09-12 01:31:01,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [80.8737, 87.67952, 2118.8235, 364.7184, 518.2606, 797.2131, 2033.4644, 1794.8617, 1518.9501, 1977.5917]
2025-09-12 01:31:01,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:31:01,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 9 minutes, 58 seconds)
2025-09-12 01:42:07,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:42:07,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:46:33,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2140.67847 ± 402.082
2025-09-12 01:46:33,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2418.1033, 2300.644, 1007.40674, 2051.8054, 2386.9414, 2443.265, 2208.781, 2370.4077, 2037.5693, 2181.8635]
2025-09-12 01:46:33,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:46:33,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2140.68) for latency MM1Queue_a033_s075
2025-09-12 01:46:33,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 23 hours, 52 minutes, 45 seconds)
2025-09-12 01:57:41,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:57:41,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:02:13,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2430.94580 ± 159.460
2025-09-12 02:02:13,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2523.3608, 2165.8362, 2652.0725, 2411.5178, 2694.2317, 2251.2825, 2290.107, 2408.323, 2418.0376, 2494.6887]
2025-09-12 02:02:13,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:02:13,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2430.95) for latency MM1Queue_a033_s075
2025-09-12 02:02:13,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 23 hours, 38 minutes, 29 seconds)
2025-09-12 02:13:29,420 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:13:29,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:18:02,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2211.27881 ± 389.417
2025-09-12 02:18:02,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1216.4105, 2746.6484, 2177.6182, 2102.3706, 2341.0464, 2146.5261, 2487.7783, 2345.4756, 2502.0535, 2046.86]
2025-09-12 02:18:02,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:18:02,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 27 minutes, 15 seconds)
2025-09-12 02:29:09,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:29:09,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:33:38,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2310.42627 ± 969.523
2025-09-12 02:33:38,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2901.1555, 3057.3923, 2843.2898, 2526.434, 2603.2532, 103.7478, 2670.745, 2527.0452, 3108.0552, 763.1451]
2025-09-12 02:33:38,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:33:38,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 11 minutes, 26 seconds)
2025-09-12 02:44:45,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:44:45,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:49:14,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2511.64282 ± 701.297
2025-09-12 02:49:14,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3103.2373, 1233.3943, 2841.0273, 2869.0244, 3100.4753, 2980.218, 2787.4004, 2752.6125, 2340.3657, 1108.6727]
2025-09-12 02:49:14,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:49:14,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2511.64) for latency MM1Queue_a033_s075
2025-09-12 02:49:14,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 22 hours, 56 minutes, 47 seconds)
2025-09-12 03:00:26,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:00:26,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:04:54,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2738.01343 ± 924.001
2025-09-12 03:04:54,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3042.3926, 3405.2656, 3097.2007, 2996.3542, 2728.3096, 3290.7095, 2648.5508, 40.564552, 3066.3306, 3064.4573]
2025-09-12 03:04:54,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:04:54,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2738.01) for latency MM1Queue_a033_s075
2025-09-12 03:04:54,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 43 minutes, 17 seconds)
2025-09-12 03:15:58,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:15:58,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:20:29,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2758.19434 ± 690.062
2025-09-12 03:20:29,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3253.9924, 2283.1296, 2696.9656, 3149.93, 3344.9885, 3103.4065, 2772.1392, 887.0831, 3002.0967, 3088.2136]
2025-09-12 03:20:29,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:20:29,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (2758.19) for latency MM1Queue_a033_s075
2025-09-12 03:20:29,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 25 minutes, 59 seconds)
2025-09-12 03:31:30,081 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:31:30,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:35:59,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3063.16870 ± 573.196
2025-09-12 03:35:59,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3227.8047, 3408.008, 3503.4065, 2904.4688, 3271.8442, 3312.019, 3341.0994, 3361.9556, 2851.526, 1449.5573]
2025-09-12 03:35:59,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:35:59,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3063.17) for latency MM1Queue_a033_s075
2025-09-12 03:35:59,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 5 minutes, 1 second)
2025-09-12 03:47:04,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:47:04,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:51:33,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2966.77539 ± 696.277
2025-09-12 03:51:33,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3545.1394, 3315.8174, 3346.0205, 2719.5327, 3334.7473, 3081.372, 3591.683, 1368.4551, 2000.0635, 3364.9216]
2025-09-12 03:51:33,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:51:33,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 21 hours, 48 minutes, 57 seconds)
2025-09-12 04:02:41,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:02:41,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:09,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2693.60864 ± 1119.815
2025-09-12 04:07:09,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3229.4607, 3486.7942, 814.3137, 3186.0452, 3555.4792, 3387.9734, 1922.1116, 3259.7249, 486.21005, 3607.972]
2025-09-12 04:07:09,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:07:09,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 33 minutes, 23 seconds)
2025-09-12 04:18:17,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:18:17,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:22:48,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3456.45776 ± 142.926
2025-09-12 04:22:48,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3327.3474, 3406.4646, 3338.2678, 3266.2131, 3556.9707, 3422.0303, 3350.4487, 3594.4707, 3744.83, 3557.5369]
2025-09-12 04:22:48,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:22:48,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3456.46) for latency MM1Queue_a033_s075
2025-09-12 04:22:48,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 17 minutes, 30 seconds)
2025-09-12 04:33:56,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:33:56,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:38:27,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2704.91333 ± 1350.119
2025-09-12 04:38:27,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3776.136, 775.60504, 3303.9297, -75.61777, 3649.2473, 3612.2346, 3551.2356, 1452.9882, 3504.576, 3498.799]
2025-09-12 04:38:27,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:38:27,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 3 minutes, 12 seconds)
2025-09-12 04:49:36,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:49:36,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:54:02,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3419.56787 ± 217.776
2025-09-12 04:54:02,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3513.2783, 3333.9177, 3731.05, 2921.094, 3421.3384, 3653.0435, 3429.2249, 3471.033, 3205.13, 3516.57]
2025-09-12 04:54:02,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:54:02,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 20 hours, 48 minutes, 58 seconds)
2025-09-12 05:05:10,062 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:05:10,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:09:44,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3516.98315 ± 142.115
2025-09-12 05:09:44,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3549.5688, 3405.4216, 3447.5479, 3386.467, 3368.1362, 3500.789, 3817.7737, 3399.1855, 3701.2292, 3593.7122]
2025-09-12 05:09:44,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:09:44,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3516.98) for latency MM1Queue_a033_s075
2025-09-12 05:09:44,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 35 minutes, 14 seconds)
2025-09-12 05:20:53,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:20:53,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:25:26,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2992.42041 ± 845.076
2025-09-12 05:25:26,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3918.733, 3699.3513, 3637.811, 1548.9498, 3224.724, 3047.7742, 3369.733, 2389.2898, 1467.3241, 3620.5154]
2025-09-12 05:25:26,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:25:26,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 21 minutes, 1 second)
2025-09-12 05:36:35,852 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:36:35,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:41:01,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3413.91553 ± 403.109
2025-09-12 05:41:01,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3282.4226, 3362.3071, 3489.6343, 3808.0227, 3874.6985, 3495.6428, 3762.0088, 3394.9966, 3299.7256, 2369.6968]
2025-09-12 05:41:01,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:41:01,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 4 minutes, 42 seconds)
2025-09-12 05:52:09,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:52:09,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:56:43,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3264.72144 ± 871.850
2025-09-12 05:56:43,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3937.4417, 3264.246, 743.94635, 3367.3743, 3197.7979, 3403.1926, 3716.9495, 3810.1604, 3723.9329, 3482.1733]
2025-09-12 05:56:43,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:56:43,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 49 minutes, 39 seconds)
2025-09-12 06:07:54,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:07:54,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:12:27,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3617.40479 ± 193.325
2025-09-12 06:12:27,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3484.948, 3420.145, 3769.8586, 3334.8118, 3524.5396, 3710.4175, 4043.2476, 3557.2085, 3725.77, 3603.1018]
2025-09-12 06:12:27,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:12:27,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3617.40) for latency MM1Queue_a033_s075
2025-09-12 06:12:27,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 36 minutes, 4 seconds)
2025-09-12 06:23:34,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:23:34,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:28:01,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2863.91187 ± 1141.434
2025-09-12 06:28:01,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3378.4824, 3710.7415, 661.08093, 981.11395, 3699.4456, 3567.5405, 3511.7483, 2977.8862, 4058.8904, 2092.1897]
2025-09-12 06:28:01,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:28:01,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 18 minutes, 37 seconds)
2025-09-12 06:39:09,511 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:39:09,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:43:42,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3104.28467 ± 799.781
2025-09-12 06:43:42,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3646.8076, 3365.7424, 2953.8015, 3547.1404, 3324.5835, 3331.5952, 759.6619, 3378.6833, 3323.7542, 3411.078]
2025-09-12 06:43:42,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:43:42,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 2 minutes, 51 seconds)
2025-09-12 06:54:52,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:54:52,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:59:28,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3348.37427 ± 1026.324
2025-09-12 06:59:28,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3639.9407, 3312.2515, 3767.5386, 3469.2905, 3897.2378, 3590.6912, 3844.8228, 4050.8638, 3579.8901, 331.21524]
2025-09-12 06:59:28,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:59:28,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 49 minutes, 32 seconds)
2025-09-12 07:10:37,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:10:37,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:15:09,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3653.51807 ± 159.280
2025-09-12 07:15:09,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3664.8447, 3506.497, 3755.962, 3444.3228, 3467.9353, 3964.5083, 3768.3745, 3515.9307, 3771.6492, 3675.1572]
2025-09-12 07:15:09,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:15:09,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3653.52) for latency MM1Queue_a033_s075
2025-09-12 07:15:09,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 33 minutes, 37 seconds)
2025-09-12 07:26:18,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:26:18,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:30:48,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3442.80591 ± 171.659
2025-09-12 07:30:48,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3471.081, 3700.8792, 3568.2837, 3529.5068, 3385.4124, 3370.333, 3046.8635, 3294.6274, 3504.656, 3556.4172]
2025-09-12 07:30:48,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:30:48,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 17 minutes)
2025-09-12 07:41:57,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:41:57,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:46:25,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3286.26245 ± 618.605
2025-09-12 07:46:25,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2436.6348, 1959.2479, 3768.9175, 3296.6208, 3695.2468, 2925.536, 3890.1362, 3802.3303, 3386.4807, 3701.473]
2025-09-12 07:46:25,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:46:25,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 1 minute, 53 seconds)
2025-09-12 07:57:34,660 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:57:34,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:02:08,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3176.44971 ± 1089.321
2025-09-12 08:02:08,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3548.626, 4037.261, 4028.017, 3803.7615, 3715.9155, 3531.799, 689.0258, 1427.1637, 3441.7053, 3541.2212]
2025-09-12 08:02:08,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:02:08,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 46 minutes, 34 seconds)
2025-09-12 08:13:18,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:13:18,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:17:50,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3067.57861 ± 1098.843
2025-09-12 08:17:50,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3214.7993, 3778.3857, 3838.3464, 653.92914, 3831.509, 3765.75, 3518.3882, 3484.8196, 1180.136, 3409.7197]
2025-09-12 08:17:50,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:17:50,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 30 minutes, 13 seconds)
2025-09-12 08:28:59,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:28:59,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:33:25,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2666.73682 ± 1046.404
2025-09-12 08:33:25,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3574.0532, 1037.5216, 3471.542, 2060.0242, 3250.4473, 907.2677, 3286.9758, 3476.345, 3787.3047, 1815.8871]
2025-09-12 08:33:25,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:33:25,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 13 minutes, 15 seconds)
2025-09-12 08:44:23,236 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:44:23,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:48:48,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3735.97656 ± 208.393
2025-09-12 08:48:48,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3914.9521, 3723.5063, 4118.8374, 3570.1074, 3773.2039, 3507.326, 3485.8706, 3567.979, 3678.689, 4019.2954]
2025-09-12 08:48:48,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:48:48,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3735.98) for latency MM1Queue_a033_s075
2025-09-12 08:48:48,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 54 minutes, 3 seconds)
2025-09-12 08:59:35,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:59:35,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:03:59,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3473.24731 ± 924.958
2025-09-12 09:04:00,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4077.522, 4018.415, 3585.325, 3589.9536, 3327.0186, 3782.4512, 4014.9404, 777.58734, 3746.5334, 3812.7263]
2025-09-12 09:04:00,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:04:00,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 33 minutes, 3 seconds)
2025-09-12 09:14:48,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:14:48,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:19:09,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3282.86060 ± 1298.491
2025-09-12 09:19:09,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4012.6816, 414.82913, 4042.3328, 4026.56, 4142.5225, 3483.3484, 4039.501, 1065.6459, 3510.071, 4091.112]
2025-09-12 09:19:09,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:19:09,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 10 minutes, 25 seconds)
2025-09-12 09:29:57,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:29:57,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:34:20,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3403.17896 ± 648.078
2025-09-12 09:34:20,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3802.433, 2122.5984, 3495.1514, 3682.4797, 3884.2861, 3761.0608, 3711.1765, 3639.3818, 2123.1323, 3810.0913]
2025-09-12 09:34:20,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:34:20,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 48 minutes, 37 seconds)
2025-09-12 09:45:09,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:45:09,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:49:36,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3114.26416 ± 1278.552
2025-09-12 09:49:36,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3681.906, 3607.4023, 3731.844, 1520.6992, 3677.4758, 3966.5073, 3687.8906, 3745.8801, 3688.8667, -165.82974]
2025-09-12 09:49:36,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:49:36,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 29 minutes, 21 seconds)
2025-09-12 10:00:26,476 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:00:26,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:04:53,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3559.20117 ± 720.244
2025-09-12 10:04:53,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3885.524, 3774.1938, 3448.322, 1502.2898, 3840.249, 3904.7732, 3311.9092, 3903.9888, 3955.341, 4065.4204]
2025-09-12 10:04:53,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:04:53,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 12 minutes, 52 seconds)
2025-09-12 10:15:43,259 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:15:43,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:20:10,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3281.40430 ± 918.947
2025-09-12 10:20:10,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3432.627, 3774.6697, 3937.016, 3732.613, 3592.8643, 3721.93, 3565.8538, 3441.328, 623.6964, 2991.4438]
2025-09-12 10:20:10,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:20:10,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 58 minutes, 50 seconds)
2025-09-12 10:31:01,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:31:01,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:35:27,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3118.62085 ± 1262.448
2025-09-12 10:35:27,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3839.214, 3857.3608, 1214.9576, 3309.4639, 3653.1106, 3654.5947, 127.07787, 3850.808, 4078.348, 3601.2761]
2025-09-12 10:35:27,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:35:27,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 45 minutes, 11 seconds)
2025-09-12 10:46:17,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:46:17,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:50:42,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3355.63672 ± 950.790
2025-09-12 10:50:42,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3723.8442, 4188.3193, 2497.614, 3860.8384, 4017.1245, 3259.7424, 3674.5996, 835.73004, 3674.051, 3824.5032]
2025-09-12 10:50:42,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:50:42,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 30 minutes, 33 seconds)
2025-09-12 11:01:33,560 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:01:33,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:05:56,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2933.33936 ± 892.569
2025-09-12 11:05:56,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1257.1382, 3905.2458, 2562.8047, 1541.2174, 3612.1636, 3647.9639, 3455.7292, 2483.814, 3125.3132, 3742.0037]
2025-09-12 11:05:56,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:05:56,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 14 minutes, 59 seconds)
2025-09-12 11:16:46,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:16:46,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:21:11,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2961.68506 ± 1097.069
2025-09-12 11:21:11,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3729.7834, 3638.7646, 2748.5742, 1511.6702, 4078.8025, 3850.3203, 3945.2937, 3146.3542, 2329.459, 637.82904]
2025-09-12 11:21:11,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:21:11,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 59 minutes, 19 seconds)
2025-09-12 11:32:01,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:32:01,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:36:25,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3393.49292 ± 887.434
2025-09-12 11:36:25,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3936.9998, 3254.055, 1068.6112, 3977.339, 3732.212, 2510.815, 3932.9028, 3816.2585, 3732.453, 3973.2856]
2025-09-12 11:36:25,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:36:25,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 43 minutes, 27 seconds)
2025-09-12 11:47:17,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:47:17,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:51:41,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3626.51904 ± 783.583
2025-09-12 11:51:41,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3848.6443, 4034.8018, 4058.593, 4143.3657, 3721.419, 3794.8655, 3897.321, 3770.6138, 3679.0186, 1316.5482]
2025-09-12 11:51:41,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:51:41,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 27 minutes, 57 seconds)
2025-09-12 12:02:34,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:02:34,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:07:00,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3752.51514 ± 292.175
2025-09-12 12:07:00,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3325.8518, 4030.3936, 3944.2473, 4094.9436, 3851.7417, 3730.5552, 3276.1355, 3471.764, 3698.5957, 4100.927]
2025-09-12 12:07:00,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:07:00,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3752.52) for latency MM1Queue_a033_s075
2025-09-12 12:07:00,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 13 minutes, 25 seconds)
2025-09-12 12:18:02,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:18:02,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:22:24,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3809.40381 ± 234.770
2025-09-12 12:22:24,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3584.435, 4007.1511, 3580.409, 3731.8865, 3951.7324, 3723.282, 3540.9836, 3886.0583, 3738.0344, 4350.0645]
2025-09-12 12:22:24,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:22:24,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3809.40) for latency MM1Queue_a033_s075
2025-09-12 12:22:24,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 59 minutes, 56 seconds)
2025-09-12 12:33:15,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:33:15,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:37:36,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3392.82373 ± 1113.159
2025-09-12 12:37:36,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4142.182, 3830.4194, 3248.0515, 3718.3967, 495.79037, 3908.5032, 2258.532, 3973.5103, 4194.3135, 4158.5386]
2025-09-12 12:37:36,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:37:36,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 44 minutes, 12 seconds)
2025-09-12 12:48:27,897 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:48:27,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:52:57,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3849.00317 ± 155.806
2025-09-12 12:52:57,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3889.4814, 3819.246, 3669.7964, 3781.7131, 3881.549, 3908.905, 4138.626, 3905.043, 3534.0017, 3961.671]
2025-09-12 12:52:57,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:52:57,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3849.00) for latency MM1Queue_a033_s075
2025-09-12 12:52:57,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 30 minutes, 6 seconds)
2025-09-12 13:03:54,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:03:54,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:08:14,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3621.71948 ± 455.451
2025-09-12 13:08:14,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4008.7146, 3171.7522, 4251.191, 3410.781, 3815.2996, 3798.7585, 3940.7598, 2580.193, 3505.846, 3733.9001]
2025-09-12 13:08:14,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:08:14,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 14 minutes, 56 seconds)
2025-09-12 13:19:04,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:19:04,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:23:27,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3384.81299 ± 1340.530
2025-09-12 13:23:27,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4087.937, 4189.4595, 4052.3562, 3815.704, 4089.3928, 802.38257, 4139.489, 4262.683, 3770.3406, 638.3842]
2025-09-12 13:23:27,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:23:27,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 58 minutes, 36 seconds)
2025-09-12 13:34:11,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:34:11,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:38:31,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2817.10889 ± 1375.791
2025-09-12 13:38:31,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2429.797, 430.48416, 3623.2236, 3917.1204, 4145.622, 3792.6052, 3952.931, 507.0827, 3646.581, 1725.6394]
2025-09-12 13:38:31,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:38:31,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 40 minutes, 15 seconds)
2025-09-12 13:49:09,928 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:49:09,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:53:32,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3575.75635 ± 1190.929
2025-09-12 13:53:32,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4397.176, 4232.7773, 3772.554, 4060.7627, 65.12556, 3573.7463, 3910.3127, 4031.95, 3841.5144, 3871.6426]
2025-09-12 13:53:32,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:53:32,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 23 minutes, 19 seconds)
2025-09-12 14:04:10,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:04:10,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:08:27,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3584.61572 ± 459.326
2025-09-12 14:08:27,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3802.6177, 2383.2852, 3791.3418, 3904.9294, 3474.926, 3732.7632, 3654.0671, 3244.7778, 3725.4626, 4131.9883]
2025-09-12 14:08:27,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:08:27,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 4 minutes, 24 seconds)
2025-09-12 14:19:06,647 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:19:06,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:23:30,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3264.04492 ± 1016.267
2025-09-12 14:23:30,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3940.1123, 3953.5906, 1779.6199, 3434.021, 4126.9067, 852.6813, 3592.397, 3779.9512, 3681.7, 3499.4714]
2025-09-12 14:23:30,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:23:30,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 47 minutes, 19 seconds)
2025-09-12 14:34:11,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:11,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:38:29,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3324.28955 ± 1187.383
2025-09-12 14:38:29,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [980.46234, 3715.1836, 3860.3455, 990.89185, 4120.9106, 4075.6208, 3536.556, 4280.133, 3723.5576, 3959.2354]
2025-09-12 14:38:29,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:38:29,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 30 minutes, 23 seconds)
2025-09-12 14:49:08,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:49:08,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:53:27,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3836.62109 ± 225.164
2025-09-12 14:53:27,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4225.186, 4033.2158, 3713.8953, 3772.41, 3820.0596, 3449.5862, 3686.9043, 4105.468, 3932.9382, 3626.5474]
2025-09-12 14:53:27,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:53:27,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 14 minutes, 26 seconds)
2025-09-12 15:04:05,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:04:05,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:08:22,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3789.04028 ± 368.717
2025-09-12 15:08:22,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4024.3447, 3490.941, 3915.112, 3985.9216, 3600.619, 3988.2039, 2842.1902, 3868.2627, 4135.473, 4039.3357]
2025-09-12 15:08:22,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:08:22,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 58 minutes, 44 seconds)
2025-09-12 15:19:01,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:19:01,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:23:20,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3389.30908 ± 1081.715
2025-09-12 15:23:20,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3947.8152, 3694.6072, 3345.0103, 3628.648, 3637.3281, 3722.3096, 4030.7788, 3672.7556, 4015.2444, 198.5941]
2025-09-12 15:23:20,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:23:20,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 44 minutes, 4 seconds)
2025-09-12 15:34:00,808 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:34:00,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:38:22,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3778.19336 ± 379.703
2025-09-12 15:38:22,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3741.9546, 4120.241, 3767.7517, 4365.045, 2822.6252, 3810.508, 3765.508, 3606.1296, 3822.0876, 3960.0806]
2025-09-12 15:38:22,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:38:22,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 28 minutes, 53 seconds)
2025-09-12 15:49:02,449 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:49:02,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:53:22,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3552.89697 ± 959.034
2025-09-12 15:53:22,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2198.7664, 4183.5537, 4299.822, 3999.039, 3796.725, 1236.1326, 3943.7725, 4216.5674, 3688.6494, 3965.9424]
2025-09-12 15:53:22,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:53:22,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 14 minutes, 3 seconds)
2025-09-12 16:04:03,825 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:04:03,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:08:21,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2757.38184 ± 1699.465
2025-09-12 16:08:21,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3698.1548, 3843.5242, 291.49878, 3616.462, 4189.093, 3851.349, 3706.3455, 125.97697, 109.64102, 4141.7734]
2025-09-12 16:08:21,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:08:21,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 59 minutes, 16 seconds)
2025-09-12 16:19:02,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:19:02,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:23:25,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3937.40479 ± 239.359
2025-09-12 16:23:25,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3693.6846, 3623.7124, 4023.5757, 3648.0168, 4113.552, 3895.7134, 4006.791, 4138.797, 4417.3115, 3812.8901]
2025-09-12 16:23:25,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:23:25,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3937.40) for latency MM1Queue_a033_s075
2025-09-12 16:23:25,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 45 minutes, 21 seconds)
2025-09-12 16:34:06,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:06,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:38:30,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2566.48340 ± 1581.072
2025-09-12 16:38:30,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1444.2695, 4048.9453, 4216.518, 3995.5, 1353.8175, 2419.3667, 4101.0215, 389.7045, 3739.6536, -43.963142]
2025-09-12 16:38:30,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:38:30,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 31 minutes, 7 seconds)
2025-09-12 16:49:10,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:49:10,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:53:28,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3740.29443 ± 429.259
2025-09-12 16:53:28,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3946.6755, 4007.445, 3660.7104, 4302.691, 4003.4346, 3838.114, 3836.3977, 3546.802, 3648.2524, 2612.4236]
2025-09-12 16:53:28,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:53:28,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 15 minutes, 40 seconds)
2025-09-12 17:04:07,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:04:07,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:08:24,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3935.45898 ± 409.298
2025-09-12 17:08:24,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3813.0752, 3930.9412, 4444.1255, 3043.2502, 3565.889, 3983.033, 4237.98, 4362.0757, 4293.517, 3680.7036]
2025-09-12 17:08:24,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:08:24,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 17 seconds)
2025-09-12 17:19:04,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:19:04,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:23:22,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3682.72998 ± 849.086
2025-09-12 17:23:22,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3568.451, 4133.408, 4108.954, 4159.0674, 3993.6045, 3919.3877, 4072.41, 3690.8315, 1195.1277, 3986.058]
2025-09-12 17:23:22,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:23:22,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 45 minutes, 5 seconds)
2025-09-12 17:34:02,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:34:02,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:38:20,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3616.32300 ± 891.267
2025-09-12 17:38:20,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1017.02075, 4108.389, 3734.2349, 3807.3235, 4169.067, 4171.598, 3655.1528, 3944.355, 3533.7056, 4022.3853]
2025-09-12 17:38:20,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:38:20,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 29 minutes, 27 seconds)
2025-09-12 17:49:00,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:49:00,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:53:21,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3611.50635 ± 830.499
2025-09-12 17:53:21,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3913.979, 2197.879, 3254.2542, 3526.6772, 4278.3047, 4277.4463, 2013.3441, 3997.4263, 4493.872, 4161.8804]
2025-09-12 17:53:21,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:53:21,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 14 minutes, 9 seconds)
2025-09-12 18:04:02,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:04:02,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:08:20,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3967.67896 ± 243.168
2025-09-12 18:08:20,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3509.2903, 4178.589, 4235.117, 3698.1855, 3961.618, 4097.1475, 4300.795, 4049.3254, 3732.6191, 3914.1035]
2025-09-12 18:08:20,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:08:20,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3967.68) for latency MM1Queue_a033_s075
2025-09-12 18:08:20,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 59 minutes, 16 seconds)
2025-09-12 18:19:01,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:19:01,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:23:20,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3970.45654 ± 226.085
2025-09-12 18:23:20,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3992.681, 3915.57, 3823.1062, 3885.545, 3441.2327, 4352.8105, 4119.933, 4082.9202, 4101.1665, 3989.6025]
2025-09-12 18:23:20,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:23:20,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3970.46) for latency MM1Queue_a033_s075
2025-09-12 18:23:20,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 44 minutes, 34 seconds)
2025-09-12 18:34:01,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:34:01,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:38:24,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3674.63989 ± 940.717
2025-09-12 18:38:24,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3870.4265, 3831.0835, 4080.7432, 3854.2214, 4214.83, 3753.0642, 894.4908, 4224.4746, 3883.0479, 4140.0146]
2025-09-12 18:38:24,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:38:24,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 30 minutes, 12 seconds)
2025-09-12 18:49:05,726 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:49:05,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:53:27,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3600.83154 ± 655.059
2025-09-12 18:53:27,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3991.0654, 3962.4766, 1690.5586, 3708.499, 3941.5205, 3681.5242, 3550.9917, 3616.959, 3964.525, 3900.1948]
2025-09-12 18:53:27,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:53:27,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 15 minutes, 34 seconds)
2025-09-12 19:04:07,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:04:07,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:08:30,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3660.72266 ± 945.399
2025-09-12 19:08:30,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3839.2163, 1414.1215, 4338.362, 4163.6406, 4212.0674, 4106.4536, 4283.3657, 3858.0361, 4140.7974, 2251.164]
2025-09-12 19:08:30,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:08:30,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 40 seconds)
2025-09-12 19:19:10,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:19:10,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:23:28,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3857.32227 ± 630.841
2025-09-12 19:23:28,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3676.5142, 4319.022, 4276.7197, 2085.6357, 3957.3528, 3818.2773, 3824.422, 4374.4375, 4136.445, 4104.397]
2025-09-12 19:23:28,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:23:28,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 45 minutes, 38 seconds)
2025-09-12 19:34:09,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:34:09,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:38:30,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3149.01025 ± 1467.330
2025-09-12 19:38:30,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4263.445, 564.12427, 4013.0005, 1138.6272, 4338.0186, 3893.185, 4465.3374, 3569.5586, 4103.4663, 1141.3396]
2025-09-12 19:38:30,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:38:30,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 30 minutes, 46 seconds)
2025-09-12 19:49:12,059 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:49:12,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:53:30,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3761.06689 ± 909.613
2025-09-12 19:53:30,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4140.9727, 3512.9583, 4026.844, 4037.532, 3973.765, 3906.9941, 4540.018, 4164.048, 4175.791, 1131.7474]
2025-09-12 19:53:30,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:53:30,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 15 minutes, 25 seconds)
2025-09-12 20:04:11,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:04:11,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:08:31,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3891.76489 ± 682.254
2025-09-12 20:08:31,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3906.5898, 4057.9602, 1970.585, 3841.3955, 4150.381, 3696.0552, 4391.401, 4489.3804, 4100.305, 4313.5996]
2025-09-12 20:08:31,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:08:31,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 17 seconds)
2025-09-12 20:19:12,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:19:12,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:23:35,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3984.30591 ± 240.860
2025-09-12 20:23:35,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3912.8499, 4581.399, 3997.721, 4003.3574, 3739.6724, 4029.91, 3986.27, 3927.4153, 4063.0466, 3601.4182]
2025-09-12 20:23:35,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:23:35,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (3984.31) for latency MM1Queue_a033_s075
2025-09-12 20:23:35,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 45 minutes, 18 seconds)
2025-09-12 20:34:15,650 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:34:15,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:38:35,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3161.16772 ± 1274.069
2025-09-12 20:38:35,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4055.5166, 3883.574, 4142.0044, 4091.7373, 3890.768, 4050.1228, 3813.0217, 1359.6931, 1388.024, 937.2167]
2025-09-12 20:38:35,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:38:35,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 30 minutes, 22 seconds)
2025-09-12 20:49:16,333 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:49:16,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:53:39,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3393.32764 ± 1114.403
2025-09-12 20:53:39,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3074.66, 3826.9954, 4049.871, 202.84631, 4011.499, 3264.7368, 4096.7764, 3740.624, 3588.705, 4076.5632]
2025-09-12 20:53:39,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:53:39,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 15 minutes, 28 seconds)
2025-09-12 21:04:20,166 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:04:20,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:08:42,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3836.97583 ± 468.920
2025-09-12 21:08:42,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3969.2651, 3914.418, 4110.9087, 2549.5579, 3848.5386, 3898.1333, 3569.9326, 4023.964, 4286.077, 4198.959]
2025-09-12 21:08:42,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:08:42,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 38 seconds)
2025-09-12 21:19:24,339 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:19:24,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:23:43,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3834.69067 ± 386.682
2025-09-12 21:23:43,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3812.0764, 3712.36, 4178.0283, 3858.7385, 4217.0317, 3571.989, 2844.9045, 3933.831, 4053.6003, 4164.346]
2025-09-12 21:23:43,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:23:43,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 45 minutes, 34 seconds)
2025-09-12 21:34:23,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:34:23,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:38:42,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3490.01709 ± 1258.539
2025-09-12 21:38:42,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4360.434, 2920.2458, 4125.4, 2670.4346, 4091.9583, 4132.0825, 3935.2788, 4151.5127, 4404.38, 108.444336]
2025-09-12 21:38:42,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:38:42,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 30 minutes, 21 seconds)
2025-09-12 21:49:24,272 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:49:24,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:53:44,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3830.66870 ± 309.922
2025-09-12 21:53:44,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3031.3218, 3983.0938, 3786.9275, 3939.8513, 3923.6055, 4037.1306, 3689.1404, 4273.874, 3729.8289, 3911.9133]
2025-09-12 21:53:44,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:53:44,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 15 minutes, 24 seconds)
2025-09-12 22:04:26,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:04:26,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:08:48,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4113.15332 ± 177.075
2025-09-12 22:08:48,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3796.0833, 4149.1626, 4336.5806, 4174.0137, 4097.777, 3866.2878, 4152.231, 4173.9517, 4391.5166, 3993.9302]
2025-09-12 22:08:48,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:08:48,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (4113.15) for latency MM1Queue_a033_s075
2025-09-12 22:08:48,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 22 seconds)
2025-09-12 22:19:30,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:19:30,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:23:49,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 2908.85181 ± 1403.583
2025-09-12 22:23:49,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3775.986, 4292.856, 4108.8965, 3771.5603, 2202.6814, 3791.4539, 945.56793, 2326.1843, 3871.4448, 1.8852438]
2025-09-12 22:23:49,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:23:49,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 45 minutes, 14 seconds)
2025-09-12 22:34:30,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:34:30,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:38:50,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 4173.80615 ± 96.846
2025-09-12 22:38:50,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4188.898, 4074.2153, 4194.9175, 4258.266, 4165.7803, 4379.671, 4098.442, 4139.001, 4017.4114, 4221.4595]
2025-09-12 22:38:50,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:38:50,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1226 [INFO]: New best (4173.81) for latency MM1Queue_a033_s075
2025-09-12 22:38:50,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 30 minutes, 14 seconds)
2025-09-12 22:49:31,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:49:31,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:53:49,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3593.97119 ± 942.419
2025-09-12 22:53:49,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4185.3403, 4115.5474, 4260.712, 2079.5898, 4073.6672, 4417.131, 4015.6035, 3298.3086, 1539.8549, 3953.9587]
2025-09-12 22:53:49,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:53:49,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 15 minutes, 11 seconds)
2025-09-12 23:04:28,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:04:28,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:08:52,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3410.94409 ± 1299.640
2025-09-12 23:08:52,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [291.28052, 3902.5056, 4236.1206, 1499.7742, 3877.6233, 4035.85, 3677.1755, 4384.495, 4090.739, 4113.878]
2025-09-12 23:08:52,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:08:52,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 12 seconds)
2025-09-12 23:19:33,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:19:33,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:23:54,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3984.05469 ± 218.178
2025-09-12 23:23:54,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [3837.4023, 4305.372, 3988.278, 3481.446, 3820.114, 3973.9883, 4062.3777, 4066.3518, 4121.574, 4183.6406]
2025-09-12 23:23:54,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:23:54,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 45 minutes, 8 seconds)
2025-09-12 23:34:34,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:34:34,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:38:57,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3847.82422 ± 720.778
2025-09-12 23:38:57,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1775.4304, 4120.5693, 4088.9463, 3819.4866, 4227.9805, 3799.8718, 3770.4138, 4360.61, 4346.0093, 4168.925]
2025-09-12 23:38:57,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:38:57,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 30 minutes, 10 seconds)
2025-09-12 23:49:39,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:49:39,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:54:03,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3600.00122 ± 1207.082
2025-09-12 23:54:03,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4109.831, 3823.175, 4484.3345, 4201.0615, 522.17035, 2109.6804, 4039.7922, 4144.3027, 4259.136, 4306.528]
2025-09-12 23:54:03,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:54:03,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 15 minutes, 13 seconds)
2025-09-13 00:04:46,407 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:04:46,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:09:07,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3398.49927 ± 1630.354
2025-09-13 00:09:07,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4416.7944, 4335.758, 215.61295, 4312.3496, 98.28745, 4291.9893, 4129.6763, 4343.2896, 4056.8003, 3784.4355]
2025-09-13 00:09:07,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:09:07,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 15 seconds)
2025-09-13 00:19:48,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:19:48,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:24:07,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3543.09839 ± 1047.578
2025-09-13 00:24:07,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4103.191, 3968.1506, 3974.1936, 621.8373, 2860.4717, 3893.9263, 4369.4165, 4023.916, 4049.6975, 3566.1826]
2025-09-13 00:24:07,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:24:07,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 45 minutes, 9 seconds)
2025-09-13 00:34:40,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:34:40,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:38:57,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3975.65479 ± 255.688
2025-09-13 00:38:57,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4190.807, 4310.3433, 3996.1025, 3523.2288, 4279.075, 3605.4148, 3983.6702, 3759.3452, 4022.7156, 4085.8423]
2025-09-13 00:38:57,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:38:57,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 1 second)
2025-09-13 00:49:31,708 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:49:31,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:53:52,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3973.31787 ± 166.345
2025-09-13 00:53:52,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4012.298, 3972.36, 3903.295, 3900.4155, 4230.609, 4095.817, 4223.4307, 3807.123, 3914.787, 3673.0476]
2025-09-13 00:53:52,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:53:52,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 58 seconds)
2025-09-13 01:04:26,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:04:26,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:08:45,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 3480.89795 ± 1220.716
2025-09-13 01:08:45,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [4002.477, 3426.4985, 4105.797, 3565.6558, 4060.5125, 3860.0627, -121.64836, 4006.4866, 3791.2136, 4111.9263]
2025-09-13 01:08:45,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:08:45,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-halfcheetah):1251 [DEBUG]: Training session finished
