2025-09-11 23:55:40,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:55:40,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:55:40,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14cd7eeb1b90>}
2025-09-11 23:55:40,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 23:55:40,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 23:55:40,330 baseline-mbpac-noiseperc25-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 23:55:40,330 baseline-mbpac-noiseperc25-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 23:55:40,338 baseline-mbpac-noiseperc25-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 23:55:41,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 23:55:41,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 00:05:29,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:05:29,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:09:55,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -323.84793 ± 38.648
2025-09-12 00:09:55,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-342.43124, -311.4873, -358.7929, -354.27145, -248.90733, -355.16174, -261.15482, -365.39288, -317.6464, -323.23315]
2025-09-12 00:09:55,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:09:55,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-323.85) for latency MM1Queue_a033_s075
2025-09-12 00:09:55,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 23 hours, 28 minutes, 58 seconds)
2025-09-12 00:20:49,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:20:49,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:25:12,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -230.97385 ± 34.679
2025-09-12 00:25:12,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-158.37471, -258.5883, -243.46426, -238.1249, -200.88068, -200.37729, -242.83597, -277.1124, -219.6706, -270.3094]
2025-09-12 00:25:12,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:25:12,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-230.97) for latency MM1Queue_a033_s075
2025-09-12 00:25:12,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 24 hours, 6 minutes, 42 seconds)
2025-09-12 00:36:07,437 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:36:07,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:40:34,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -106.03658 ± 47.549
2025-09-12 00:40:34,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-69.115616, -124.740746, -60.718716, -112.35383, -177.87134, -101.37684, -74.75419, -26.934233, -182.15636, -130.34386]
2025-09-12 00:40:34,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:40:34,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-106.04) for latency MM1Queue_a033_s075
2025-09-12 00:40:34,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 24 hours, 11 minutes, 27 seconds)
2025-09-12 00:51:31,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:51:31,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:55:57,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -42.01056 ± 46.099
2025-09-12 00:55:57,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-151.15694, -22.586256, -36.101845, -62.627224, 12.806384, -7.6728444, -62.39362, -64.85106, -41.646473, 16.124298]
2025-09-12 00:55:57,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:55:57,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (-42.01) for latency MM1Queue_a033_s075
2025-09-12 00:55:58,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 24 hours, 6 minutes, 38 seconds)
2025-09-12 01:06:54,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:06:54,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:11:22,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 35.59224 ± 32.622
2025-09-12 01:11:22,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [26.574553, 21.783682, 41.955017, -13.349605, -27.38982, 71.01006, 44.68658, 70.10537, 70.77732, 49.769234]
2025-09-12 01:11:22,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:11:22,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (35.59) for latency MM1Queue_a033_s075
2025-09-12 01:11:22,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 23 hours, 57 minutes, 55 seconds)
2025-09-12 01:22:19,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:22:19,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:26:41,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 167.14775 ± 82.975
2025-09-12 01:26:41,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [167.58476, 91.634224, 184.16245, 165.346, 68.19162, 60.739147, 364.21915, 182.29797, 168.48418, 218.81813]
2025-09-12 01:26:41,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:26:41,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (167.15) for latency MM1Queue_a033_s075
2025-09-12 01:26:41,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 3 minutes, 21 seconds)
2025-09-12 01:37:39,044 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:37:39,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:42:03,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 224.43047 ± 110.675
2025-09-12 01:42:03,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [254.66896, 524.14606, 104.86341, 206.9637, 144.60083, 154.32066, 173.5413, 234.81042, 183.46758, 262.922]
2025-09-12 01:42:03,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:42:03,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (224.43) for latency MM1Queue_a033_s075
2025-09-12 01:42:03,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 23 hours, 49 minutes, 24 seconds)
2025-09-12 01:53:02,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:53:02,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:57:26,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 494.82867 ± 143.389
2025-09-12 01:57:26,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [629.2276, 301.1333, 464.72568, 555.29095, 404.05566, 301.85538, 766.3904, 377.94632, 580.64386, 567.0174]
2025-09-12 01:57:26,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:57:26,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (494.83) for latency MM1Queue_a033_s075
2025-09-12 01:57:26,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 23 hours, 34 minutes, 22 seconds)
2025-09-12 02:08:26,237 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:08:26,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:12:49,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 486.87646 ± 126.445
2025-09-12 02:12:49,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [395.8924, 411.37198, 582.0241, 610.917, 468.01218, 459.04208, 689.13446, 619.96027, 357.78592, 274.6241]
2025-09-12 02:12:49,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:12:49,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 23 hours, 18 minutes, 51 seconds)
2025-09-12 02:23:48,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:23:48,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:28:12,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 685.98615 ± 143.770
2025-09-12 02:28:12,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [815.6274, 718.14087, 550.06964, 792.4676, 633.1317, 618.4548, 823.3358, 702.8456, 359.24207, 846.54565]
2025-09-12 02:28:12,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:28:12,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (685.99) for latency MM1Queue_a033_s075
2025-09-12 02:28:12,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 3 minutes, 9 seconds)
2025-09-12 02:39:11,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:39:11,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:43:34,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 738.00763 ± 138.170
2025-09-12 02:43:34,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [617.5383, 751.2774, 839.98883, 733.6762, 944.62415, 754.9232, 723.45953, 791.4542, 402.06622, 821.0678]
2025-09-12 02:43:34,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:43:34,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (738.01) for latency MM1Queue_a033_s075
2025-09-12 02:43:34,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 22 hours, 48 minutes, 29 seconds)
2025-09-12 02:54:33,418 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:54:33,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:58:55,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 733.46625 ± 183.568
2025-09-12 02:58:55,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [541.23285, 975.3599, 550.874, 816.0878, 959.6565, 544.6848, 839.51685, 627.73944, 954.27374, 525.2362]
2025-09-12 02:58:55,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:58:55,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 22 hours, 32 minutes, 51 seconds)
2025-09-12 03:09:37,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:09:37,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:13:54,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 723.62109 ± 212.718
2025-09-12 03:13:54,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1018.51917, 993.25037, 581.9544, 283.11273, 717.15875, 644.0914, 710.0236, 866.1902, 871.4388, 550.471]
2025-09-12 03:13:54,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:13:54,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 10 minutes, 22 seconds)
2025-09-12 03:24:33,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:24:33,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:28:49,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 709.26074 ± 289.135
2025-09-12 03:28:49,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1119.8097, 988.1697, 787.05316, 485.6326, 375.45374, 1081.4834, 427.6776, 414.96594, 463.59766, 948.7634]
2025-09-12 03:28:49,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:28:49,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 21 hours, 47 minutes, 6 seconds)
2025-09-12 03:39:28,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:39:28,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:43:47,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 942.51233 ± 166.407
2025-09-12 03:43:47,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1244.3878, 1092.1951, 754.45575, 933.43976, 960.62933, 977.37823, 892.9775, 948.80444, 599.47485, 1021.37994]
2025-09-12 03:43:47,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:43:47,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (942.51) for latency MM1Queue_a033_s075
2025-09-12 03:43:47,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 21 hours, 24 minutes, 54 seconds)
2025-09-12 03:54:28,217 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:54:28,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:58:47,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1013.60529 ± 224.348
2025-09-12 03:58:47,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1244.8385, 1082.4658, 1139.4259, 930.21124, 1235.2368, 980.3594, 407.58817, 1070.3572, 1049.7249, 995.84424]
2025-09-12 03:58:47,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:58:47,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1013.61) for latency MM1Queue_a033_s075
2025-09-12 03:58:47,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 21 hours, 3 minutes, 33 seconds)
2025-09-12 04:09:27,459 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:09:27,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:13:42,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 967.05646 ± 249.432
2025-09-12 04:13:42,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [455.206, 1176.2263, 965.7235, 1061.4886, 517.7984, 1152.6449, 1167.9865, 1005.37463, 1115.7384, 1052.3766]
2025-09-12 04:13:42,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:13:42,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 41 minutes, 16 seconds)
2025-09-12 04:24:22,345 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:24:22,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:28:41,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1067.73169 ± 169.429
2025-09-12 04:28:41,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [930.0578, 639.63666, 1197.5631, 1083.0146, 1230.1196, 1119.7872, 999.409, 1089.0996, 1166.9006, 1221.7296]
2025-09-12 04:28:41,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:28:41,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1067.73) for latency MM1Queue_a033_s075
2025-09-12 04:28:41,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 20 hours, 26 minutes, 20 seconds)
2025-09-12 04:39:21,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:39:21,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:43:35,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1170.23914 ± 88.858
2025-09-12 04:43:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1137.6871, 1170.3158, 1271.0863, 1249.207, 1125.0143, 1056.8871, 1339.2711, 1172.5691, 1039.1868, 1141.1671]
2025-09-12 04:43:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:43:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1170.24) for latency MM1Queue_a033_s075
2025-09-12 04:43:35,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 20 hours, 11 minutes, 23 seconds)
2025-09-12 04:54:17,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:54:17,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:58:36,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1162.41626 ± 196.779
2025-09-12 04:58:36,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1258.3718, 1382.7202, 1345.1953, 1250.9869, 1299.4961, 875.3293, 1047.4948, 1310.1846, 1070.9146, 783.4694]
2025-09-12 04:58:36,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:58:36,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 19 hours, 56 minutes, 56 seconds)
2025-09-12 05:09:18,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:09:18,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:13:39,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1079.86377 ± 284.528
2025-09-12 05:13:39,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1152.7976, 1138.7439, 1333.9358, 1172.6857, 1231.018, 1232.8328, 420.42938, 1021.3048, 1394.5574, 700.332]
2025-09-12 05:13:39,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:13:39,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 19 hours, 42 minutes, 53 seconds)
2025-09-12 05:24:17,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:24:17,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:28:34,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1268.23169 ± 184.762
2025-09-12 05:28:34,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1415.6853, 1443.0238, 1292.6249, 1093.687, 1306.5028, 1202.882, 1255.1018, 841.70386, 1299.8536, 1531.2529]
2025-09-12 05:28:34,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:28:34,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1268.23) for latency MM1Queue_a033_s075
2025-09-12 05:28:34,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 19 hours, 27 minutes, 58 seconds)
2025-09-12 05:39:12,808 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:39:12,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:43:26,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1154.77246 ± 307.503
2025-09-12 05:43:26,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1341.4952, 1324.3265, 648.9054, 1383.2721, 1284.7716, 1219.694, 464.7108, 1342.1433, 1181.853, 1356.5524]
2025-09-12 05:43:26,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:43:26,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 19 hours, 11 minutes, 13 seconds)
2025-09-12 05:54:04,605 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:54:04,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:58:22,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1296.19031 ± 71.667
2025-09-12 05:58:22,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1262.0437, 1204.5293, 1292.2909, 1358.511, 1331.9832, 1230.9482, 1242.7654, 1246.2869, 1455.3491, 1337.1959]
2025-09-12 05:58:22,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:58:22,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1296.19) for latency MM1Queue_a033_s075
2025-09-12 05:58:22,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 18 hours, 56 minutes, 33 seconds)
2025-09-12 06:09:02,568 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:09:02,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:13:20,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1262.24768 ± 102.301
2025-09-12 06:13:20,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1256.5208, 1285.9783, 1065.8893, 1193.1302, 1384.7981, 1327.3663, 1328.2727, 1332.8992, 1344.5593, 1103.061]
2025-09-12 06:13:20,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:13:20,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 41 minutes, 6 seconds)
2025-09-12 06:24:02,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:24:02,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:28:17,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1258.59619 ± 246.214
2025-09-12 06:28:17,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1246.001, 1436.8636, 1562.4142, 1413.1906, 1026.2878, 1503.906, 1112.574, 1448.7053, 752.9919, 1083.0265]
2025-09-12 06:28:17,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:28:17,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 24 minutes, 40 seconds)
2025-09-12 06:38:59,995 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:38:59,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:43:20,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1061.36304 ± 431.409
2025-09-12 06:43:20,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1053.0142, 1523.3743, 509.55402, 659.5312, 1470.0004, 1421.5464, 357.4005, 1415.69, 740.8148, 1462.7041]
2025-09-12 06:43:20,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:43:20,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 11 minutes, 27 seconds)
2025-09-12 06:54:02,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:54:02,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:58:24,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1441.30090 ± 101.195
2025-09-12 06:58:24,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1580.0647, 1507.6042, 1573.5911, 1368.4486, 1214.1587, 1457.6096, 1444.1298, 1378.999, 1453.632, 1434.7717]
2025-09-12 06:58:24,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:58:24,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1441.30) for latency MM1Queue_a033_s075
2025-09-12 06:58:24,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 17 hours, 59 minutes, 29 seconds)
2025-09-12 07:09:06,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:09:06,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:13:26,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1455.35608 ± 127.230
2025-09-12 07:13:26,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1527.945, 1356.8118, 1364.927, 1329.9507, 1528.0737, 1396.941, 1275.8059, 1577.0957, 1482.8197, 1713.1907]
2025-09-12 07:13:26,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:13:26,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1455.36) for latency MM1Queue_a033_s075
2025-09-12 07:13:26,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 17 hours, 45 minutes, 56 seconds)
2025-09-12 07:24:08,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:24:08,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:28:28,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1431.31116 ± 241.068
2025-09-12 07:28:28,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1704.8116, 1522.3243, 1408.899, 1446.703, 1475.8593, 1569.693, 1354.4281, 766.65607, 1467.5371, 1596.1987]
2025-09-12 07:28:28,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:28:28,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 31 minutes, 44 seconds)
2025-09-12 07:39:10,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:39:10,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:43:24,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1405.66357 ± 130.913
2025-09-12 07:43:24,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1342.8018, 1390.1077, 1572.5499, 1468.0208, 1377.4072, 1086.7234, 1567.7118, 1480.5469, 1374.4902, 1396.2751]
2025-09-12 07:43:24,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:43:24,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 16 minutes, 27 seconds)
2025-09-12 07:54:01,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:54:01,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:58:17,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1338.70032 ± 249.506
2025-09-12 07:58:17,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [612.4263, 1470.3104, 1334.3, 1517.8722, 1404.6411, 1476.852, 1446.6573, 1336.5674, 1440.9819, 1346.3938]
2025-09-12 07:58:17,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:58:17,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 59 minutes, 30 seconds)
2025-09-12 08:08:58,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:08:58,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:13:16,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1406.03931 ± 340.652
2025-09-12 08:13:16,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [405.4233, 1686.8934, 1455.2972, 1480.2958, 1538.3485, 1555.5814, 1440.7242, 1502.3254, 1548.8584, 1446.6461]
2025-09-12 08:13:16,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:13:16,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 43 minutes, 11 seconds)
2025-09-12 08:23:56,967 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:23:56,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:28:13,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1474.21167 ± 343.239
2025-09-12 08:28:13,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1595.5536, 1466.651, 1776.9917, 487.988, 1605.0592, 1618.798, 1516.502, 1412.423, 1686.7246, 1575.426]
2025-09-12 08:28:13,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:28:13,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1474.21) for latency MM1Queue_a033_s075
2025-09-12 08:28:13,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 27 minutes, 7 seconds)
2025-09-12 08:38:53,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:38:53,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:43:15,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1451.07727 ± 352.753
2025-09-12 08:43:15,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1473.1812, 1811.0142, 1600.7365, 1658.5336, 1802.1465, 1473.3375, 827.8414, 1650.1443, 1467.5994, 746.23724]
2025-09-12 08:43:15,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:43:15,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 12 minutes, 10 seconds)
2025-09-12 08:53:56,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:53:56,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:58:16,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1515.41382 ± 110.057
2025-09-12 08:58:16,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1621.074, 1557.9037, 1519.2162, 1608.7747, 1247.4474, 1428.108, 1594.99, 1444.5198, 1607.9451, 1524.1588]
2025-09-12 08:58:16,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:58:16,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1515.41) for latency MM1Queue_a033_s075
2025-09-12 08:58:16,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 58 minutes, 18 seconds)
2025-09-12 09:08:58,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:08:58,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:13:16,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1600.60876 ± 193.570
2025-09-12 09:13:16,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1938.2821, 1745.2537, 1673.125, 1613.4823, 1142.9318, 1626.6768, 1636.0714, 1474.1594, 1523.6881, 1632.4167]
2025-09-12 09:13:16,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:13:16,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1600.61) for latency MM1Queue_a033_s075
2025-09-12 09:13:16,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 44 minutes, 41 seconds)
2025-09-12 09:23:58,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:23:58,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:28:14,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1592.48425 ± 109.324
2025-09-12 09:28:14,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1554.1051, 1665.9203, 1527.9998, 1768.1537, 1603.4541, 1544.0492, 1682.4534, 1375.9954, 1701.4739, 1501.2383]
2025-09-12 09:28:14,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:28:14,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 29 minutes, 41 seconds)
2025-09-12 09:38:57,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:38:57,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:43:16,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1602.44165 ± 134.340
2025-09-12 09:43:16,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1664.1261, 1312.3013, 1478.2787, 1527.0309, 1796.0243, 1676.163, 1687.8739, 1703.7562, 1516.7289, 1662.1333]
2025-09-12 09:43:16,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:43:16,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1602.44) for latency MM1Queue_a033_s075
2025-09-12 09:43:16,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 15 minutes, 33 seconds)
2025-09-12 09:53:58,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:53:58,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:58:17,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1445.21606 ± 412.366
2025-09-12 09:58:17,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [899.99347, 1455.5092, 1710.1229, 1688.1027, 1684.3181, 1675.8713, 424.30267, 1635.7107, 1569.1204, 1709.1086]
2025-09-12 09:58:17,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:58:17,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 29 seconds)
2025-09-12 10:09:00,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:09:00,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:13:20,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1685.04858 ± 107.200
2025-09-12 10:13:20,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1810.3771, 1550.2804, 1797.4766, 1658.0599, 1510.389, 1811.2483, 1615.257, 1602.5039, 1777.2448, 1717.6488]
2025-09-12 10:13:20,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:13:20,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1685.05) for latency MM1Queue_a033_s075
2025-09-12 10:13:20,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 45 minutes, 54 seconds)
2025-09-12 10:23:59,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:23:59,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:28:17,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1657.73804 ± 92.895
2025-09-12 10:28:17,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1664.4119, 1679.8704, 1636.0552, 1613.6163, 1612.6498, 1714.433, 1852.9731, 1595.6663, 1726.9764, 1480.7279]
2025-09-12 10:28:17,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:28:17,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 30 minutes, 17 seconds)
2025-09-12 10:38:57,814 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:38:57,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:43:12,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1515.98706 ± 354.565
2025-09-12 10:43:12,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1356.1058, 1624.1622, 1645.1329, 1773.9188, 503.92575, 1602.2157, 1572.9197, 1666.198, 1654.1868, 1761.1051]
2025-09-12 10:43:12,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:43:12,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 14 minutes, 38 seconds)
2025-09-12 10:53:53,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:53:53,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:58:13,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1702.78589 ± 78.758
2025-09-12 10:58:13,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1711.9669, 1797.9141, 1715.4537, 1700.0535, 1744.0579, 1644.2129, 1661.2546, 1720.997, 1519.644, 1812.3052]
2025-09-12 10:58:13,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:58:13,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1702.79) for latency MM1Queue_a033_s075
2025-09-12 10:58:13,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 59 minutes, 30 seconds)
2025-09-12 11:08:54,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:08:54,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:13:15,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1709.71680 ± 91.797
2025-09-12 11:13:15,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1728.8882, 1648.0648, 1590.8931, 1836.7706, 1718.938, 1724.3993, 1902.1255, 1618.0057, 1667.8375, 1661.2434]
2025-09-12 11:13:15,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:13:15,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1709.72) for latency MM1Queue_a033_s075
2025-09-12 11:13:15,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 44 minutes, 36 seconds)
2025-09-12 11:23:56,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:23:56,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:28:13,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1594.22925 ± 326.696
2025-09-12 11:28:13,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1726.7019, 1460.8156, 1635.3414, 1656.0372, 1775.8423, 1692.1842, 662.5563, 1752.4215, 1873.9222, 1706.471]
2025-09-12 11:28:13,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:28:13,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 28 minutes, 38 seconds)
2025-09-12 11:38:56,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:38:56,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:43:16,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1729.11426 ± 89.062
2025-09-12 11:43:16,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1727.8645, 1685.9844, 1963.231, 1630.6195, 1672.1959, 1755.8713, 1679.7273, 1685.7622, 1790.613, 1699.2732]
2025-09-12 11:43:16,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:43:16,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1729.11) for latency MM1Queue_a033_s075
2025-09-12 11:43:16,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 14 minutes, 42 seconds)
2025-09-12 11:54:00,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:54:00,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:58:20,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1591.28125 ± 417.182
2025-09-12 11:58:20,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1716.2985, 1644.5463, 1566.571, 1739.8794, 1879.7347, 1764.4746, 364.15317, 1745.1206, 1680.2568, 1811.7777]
2025-09-12 11:58:20,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:58:20,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 1 minute, 20 seconds)
2025-09-12 12:09:03,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:09:03,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:13:23,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1454.12915 ± 370.129
2025-09-12 12:13:23,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1654.3699, 1654.4512, 1640.5878, 1618.1974, 1024.8147, 1300.8744, 579.6975, 1928.9191, 1611.4282, 1527.9521]
2025-09-12 12:13:23,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:13:23,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 46 minutes, 38 seconds)
2025-09-12 12:24:04,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:24:04,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:28:25,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1747.91150 ± 58.425
2025-09-12 12:28:25,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1786.7572, 1779.5701, 1777.9247, 1620.9136, 1824.394, 1768.3868, 1781.3715, 1676.9332, 1702.9242, 1759.9409]
2025-09-12 12:28:25,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:28:25,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1747.91) for latency MM1Queue_a033_s075
2025-09-12 12:28:25,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 31 minutes, 37 seconds)
2025-09-12 12:39:09,136 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:39:09,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:43:28,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1735.64233 ± 110.086
2025-09-12 12:43:28,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1751.0414, 1673.8374, 1823.5387, 1873.2407, 1825.1316, 1692.3363, 1727.7704, 1571.2887, 1547.705, 1870.5328]
2025-09-12 12:43:28,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:43:28,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 17 minutes, 28 seconds)
2025-09-12 12:54:11,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:54:11,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:58:31,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1750.59119 ± 186.966
2025-09-12 12:58:31,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1735.4008, 1779.889, 1679.7064, 1287.9192, 1749.6498, 1712.5985, 1947.037, 1860.0907, 1725.9855, 2027.6351]
2025-09-12 12:58:31,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:58:31,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1750.59) for latency MM1Queue_a033_s075
2025-09-12 12:58:31,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 2 minutes, 29 seconds)
2025-09-12 13:09:11,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:09:11,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:13:26,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1825.53345 ± 43.092
2025-09-12 13:13:26,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1893.6643, 1879.2656, 1794.2666, 1868.9927, 1817.5836, 1744.2124, 1818.1864, 1787.5508, 1826.4203, 1825.1918]
2025-09-12 13:13:26,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:13:26,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1825.53) for latency MM1Queue_a033_s075
2025-09-12 13:13:26,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 45 minutes, 54 seconds)
2025-09-12 13:24:06,963 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:06,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:28:26,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1807.89355 ± 125.793
2025-09-12 13:28:26,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1828.8477, 1819.8494, 2088.065, 1871.2264, 1656.0186, 1797.4338, 1923.9973, 1674.1918, 1735.2925, 1684.0128]
2025-09-12 13:28:26,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:28:26,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 30 minutes, 33 seconds)
2025-09-12 13:39:07,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:39:07,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:43:26,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1780.04333 ± 106.073
2025-09-12 13:43:26,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1705.7927, 1932.2821, 1955.8743, 1869.5652, 1609.3525, 1719.4186, 1710.5197, 1771.4442, 1820.5457, 1705.6393]
2025-09-12 13:43:26,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:43:26,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 15 minutes, 13 seconds)
2025-09-12 13:54:09,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:54:09,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:58:29,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1888.47131 ± 160.929
2025-09-12 13:58:29,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2005.594, 1957.3857, 2109.446, 1926.8461, 1583.7354, 1799.5337, 1759.686, 1698.2163, 2000.6285, 2043.6405]
2025-09-12 13:58:29,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:58:29,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1888.47) for latency MM1Queue_a033_s075
2025-09-12 13:58:29,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 7 seconds)
2025-09-12 14:09:10,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:09:10,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:13:32,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1839.34766 ± 133.446
2025-09-12 14:13:32,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1856.2731, 2002.3154, 1771.6195, 1979.481, 1972.2413, 1598.026, 1699.2393, 1719.6559, 1969.5469, 1825.0789]
2025-09-12 14:13:32,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:13:32,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 45 minutes, 4 seconds)
2025-09-12 14:24:15,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:24:15,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:28:35,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1775.86780 ± 281.796
2025-09-12 14:28:35,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1924.0116, 1759.9435, 1889.1003, 1963.0311, 949.6299, 1813.2272, 1786.4019, 1888.7417, 1894.7881, 1889.8029]
2025-09-12 14:28:35,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:28:35,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 31 minutes, 14 seconds)
2025-09-12 14:39:16,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:39:16,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:43:35,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1673.08533 ± 411.683
2025-09-12 14:43:35,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1977.6918, 1804.173, 1653.2836, 1881.0182, 579.594, 1693.848, 1989.2086, 1959.149, 1871.7247, 1321.1625]
2025-09-12 14:43:35,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:43:35,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 16 minutes, 15 seconds)
2025-09-12 14:54:17,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:54:17,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:58:37,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1775.52246 ± 129.096
2025-09-12 14:58:37,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1668.681, 1744.3727, 1939.2355, 1726.1512, 1730.3652, 1828.6376, 1547.9365, 1714.2164, 1832.336, 2023.2926]
2025-09-12 14:58:37,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:58:37,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 1 minute, 25 seconds)
2025-09-12 15:09:19,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:09:19,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:13:38,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1751.05347 ± 492.003
2025-09-12 15:13:38,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2023.5133, 1993.5088, 1808.1132, 1877.0232, 2028.1694, 2000.2043, 1792.2507, 326.4034, 2037.6193, 1623.7314]
2025-09-12 15:13:38,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:13:38,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 46 minutes, 14 seconds)
2025-09-12 15:24:20,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:24:20,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:28:39,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1799.23901 ± 89.820
2025-09-12 15:28:39,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1868.7399, 1745.7778, 1633.207, 1877.1603, 1778.9409, 1730.6526, 1784.3549, 1733.1549, 1906.3918, 1934.0116]
2025-09-12 15:28:39,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:28:39,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 30 minutes, 56 seconds)
2025-09-12 15:39:23,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:39:23,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:43:41,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1670.29761 ± 402.916
2025-09-12 15:43:41,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1984.3706, 2009.8292, 2014.8113, 1020.62354, 1694.1173, 1811.9092, 1820.8739, 1444.1404, 868.81384, 2033.4869]
2025-09-12 15:43:41,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:43:41,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 15 minutes, 45 seconds)
2025-09-12 15:54:24,754 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:54:24,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:58:42,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1866.43201 ± 85.541
2025-09-12 15:58:42,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1673.3539, 1932.7827, 1911.2845, 1945.1935, 1786.953, 1840.5969, 1820.7357, 1859.2755, 1971.0125, 1923.1321]
2025-09-12 15:58:42,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:58:42,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 49 seconds)
2025-09-12 16:09:21,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:09:21,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:13:40,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1839.72827 ± 263.003
2025-09-12 16:13:40,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1312.0874, 1827.7816, 1458.7546, 1807.4763, 2179.854, 1791.7258, 1871.6892, 1946.8418, 2115.5747, 2085.4978]
2025-09-12 16:13:40,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:13:40,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 45 minutes, 22 seconds)
2025-09-12 16:24:20,157 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:24:20,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:28:40,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1926.83398 ± 117.496
2025-09-12 16:28:40,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1986.4797, 2052.7605, 1926.428, 1932.1592, 1638.1852, 1954.3419, 1935.1102, 1901.6573, 2093.083, 1848.1343]
2025-09-12 16:28:40,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:28:40,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1926.83) for latency MM1Queue_a033_s075
2025-09-12 16:28:40,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 30 minutes, 11 seconds)
2025-09-12 16:39:21,036 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:39:21,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:43:38,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1882.41479 ± 148.659
2025-09-12 16:43:38,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2166.1616, 1870.2249, 1712.1045, 2034.2267, 1721.7993, 1740.9786, 1995.8368, 1990.9376, 1757.9755, 1833.905]
2025-09-12 16:43:38,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:43:38,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 14 minutes, 48 seconds)
2025-09-12 16:54:18,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:54:18,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:58:37,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1937.49780 ± 128.511
2025-09-12 16:58:37,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2002.1146, 1957.2751, 2067.531, 1839.734, 1741.3718, 1843.9249, 1887.8142, 2224.3584, 1899.0468, 1911.8071]
2025-09-12 16:58:37,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:58:37,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (1937.50) for latency MM1Queue_a033_s075
2025-09-12 16:58:37,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 59 minutes, 34 seconds)
2025-09-12 17:09:18,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:09:18,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:13:37,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1931.22595 ± 135.857
2025-09-12 17:13:37,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1890.1155, 2015.6753, 1921.4001, 2011.0051, 2235.9006, 1818.7146, 1863.2766, 1841.663, 1999.0653, 1715.4436]
2025-09-12 17:13:37,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:13:37,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 44 minutes, 27 seconds)
2025-09-12 17:24:19,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:24:19,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:28:33,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1782.43164 ± 505.035
2025-09-12 17:28:33,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1935.7767, 1745.7418, 2171.5496, 311.42316, 2051.796, 1851.3856, 1809.5515, 2005.1395, 2047.5303, 1894.4219]
2025-09-12 17:28:33,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:28:33,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 29 minutes, 20 seconds)
2025-09-12 17:39:15,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:39:15,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:43:34,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2021.18921 ± 114.495
2025-09-12 17:43:34,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2124.3508, 2086.7896, 2099.908, 1745.0499, 2030.7128, 2109.901, 1954.9579, 1977.8505, 2137.8086, 1944.5651]
2025-09-12 17:43:34,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:43:34,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2021.19) for latency MM1Queue_a033_s075
2025-09-12 17:43:34,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 14 minutes, 22 seconds)
2025-09-12 17:54:15,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:54:15,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:58:35,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1863.70679 ± 358.146
2025-09-12 17:58:35,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1885.591, 1858.2241, 2203.8284, 1814.5791, 2107.7825, 2064.822, 1819.9783, 858.6537, 1947.9708, 2075.6387]
2025-09-12 17:58:35,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:58:35,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 59 minutes, 45 seconds)
2025-09-12 18:09:18,880 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:09:18,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:13:37,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1840.43530 ± 306.296
2025-09-12 18:13:37,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1994.512, 1591.5549, 2021.5334, 1843.6842, 1933.389, 1964.5636, 1947.7894, 1006.2975, 2045.2578, 2055.772]
2025-09-12 18:13:37,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:13:37,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 45 minutes, 1 second)
2025-09-12 18:24:19,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:24:19,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:28:36,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2048.49805 ± 114.598
2025-09-12 18:28:36,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1934.4032, 2133.6272, 2157.2, 2160.8608, 1910.2987, 2063.821, 2139.2637, 2046.5325, 1816.9087, 2122.0657]
2025-09-12 18:28:36,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:28:36,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2048.50) for latency MM1Queue_a033_s075
2025-09-12 18:28:36,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 29 minutes, 52 seconds)
2025-09-12 18:39:17,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:39:17,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:43:37,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2040.20312 ± 181.763
2025-09-12 18:43:37,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1794.248, 2144.5713, 2056.4607, 2030.258, 2151.8381, 1929.5521, 2084.359, 1679.5566, 2253.238, 2277.9495]
2025-09-12 18:43:37,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:43:37,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 15 minutes, 19 seconds)
2025-09-12 18:54:19,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:54:19,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:58:38,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2061.18213 ± 189.134
2025-09-12 18:58:38,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2041.5684, 2121.5115, 1538.8057, 2253.7234, 2138.3638, 2172.1545, 2027.3796, 2216.1372, 2058.6, 2043.5785]
2025-09-12 18:58:38,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:58:38,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2061.18) for latency MM1Queue_a033_s075
2025-09-12 18:58:38,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 20 seconds)
2025-09-12 19:09:20,105 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:09:20,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:13:39,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2031.13379 ± 121.943
2025-09-12 19:13:39,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2032.941, 2253.8086, 1951.8676, 1889.2762, 1900.5442, 1973.1223, 2229.969, 2110.434, 1951.9115, 2017.4633]
2025-09-12 19:13:39,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:13:39,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 45 minutes, 16 seconds)
2025-09-12 19:24:16,935 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:24:16,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:28:31,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2008.21484 ± 94.833
2025-09-12 19:28:31,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2113.7646, 2002.315, 1993.7285, 2040.9546, 1936.1681, 2082.5593, 2114.709, 1784.9521, 1953.7551, 2059.2427]
2025-09-12 19:28:31,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:28:31,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 29 minutes, 31 seconds)
2025-09-12 19:39:09,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:39:09,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:43:28,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2042.30432 ± 150.502
2025-09-12 19:43:28,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2015.6963, 1948.3158, 2088.103, 2017.2605, 1947.2241, 1773.4697, 2260.071, 2228.8682, 2228.4365, 1915.5972]
2025-09-12 19:43:28,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:43:28,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 14 minutes, 29 seconds)
2025-09-12 19:54:05,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:54:05,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:58:27,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1861.87817 ± 444.513
2025-09-12 19:58:27,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2082.8552, 2231.5535, 2174.88, 1953.7317, 2093.6938, 912.47437, 1979.9452, 2119.6628, 1068.7228, 2001.2612]
2025-09-12 19:58:27,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:58:27,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 59 minutes, 19 seconds)
2025-09-12 20:09:20,376 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:09:20,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:13:38,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2032.38770 ± 142.490
2025-09-12 20:13:38,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2095.193, 2214.0986, 2154.9114, 2271.1323, 1964.1656, 1874.825, 2018.6936, 2023.248, 1870.0062, 1837.6013]
2025-09-12 20:13:38,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:13:38,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 44 minutes, 58 seconds)
2025-09-12 20:24:29,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:24:29,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:28:52,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1945.77576 ± 134.311
2025-09-12 20:28:52,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2188.4758, 1897.1315, 2110.08, 1859.5028, 1878.5452, 1737.9387, 1943.8567, 2078.362, 1806.9845, 1956.8818]
2025-09-12 20:28:52,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:28:52,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 30 minutes, 48 seconds)
2025-09-12 20:39:44,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:39:44,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:44:08,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1918.20312 ± 185.248
2025-09-12 20:44:08,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2152.3733, 2003.022, 1704.5278, 2124.113, 1628.6989, 1890.5758, 2152.4446, 1994.6862, 1740.0073, 1791.5828]
2025-09-12 20:44:08,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:44:08,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 17 minutes, 6 seconds)
2025-09-12 20:55:03,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:55:03,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:59:22,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1982.22717 ± 275.705
2025-09-12 20:59:22,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1238.1418, 2089.2576, 2276.2935, 2077.744, 2185.85, 1855.3292, 2080.7537, 1940.3887, 2148.5413, 1929.9735]
2025-09-12 20:59:22,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:59:22,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 2 minutes, 52 seconds)
2025-09-12 21:10:17,875 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:10:17,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:14:37,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2027.80249 ± 115.064
2025-09-12 21:14:37,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2086.5747, 1851.9998, 2171.298, 2067.2656, 2062.4255, 1956.9988, 1933.5902, 1891.142, 2023.7256, 2233.0054]
2025-09-12 21:14:37,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:14:37,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 48 minutes, 30 seconds)
2025-09-12 21:25:33,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:25:33,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:29:56,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1893.87268 ± 500.859
2025-09-12 21:29:56,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [395.86703, 2057.6416, 1979.8036, 2086.1987, 2061.8774, 2068.2456, 2111.0203, 2069.6057, 2105.6174, 2002.8488]
2025-09-12 21:29:56,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:29:56,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 33 minutes, 38 seconds)
2025-09-12 21:40:54,459 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:40:54,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:45:19,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1969.69397 ± 281.734
2025-09-12 21:45:19,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1650.0074, 2236.2717, 2135.8477, 1315.5286, 2058.2024, 2344.4226, 1977.4309, 2089.069, 1921.0044, 1969.1537]
2025-09-12 21:45:19,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:45:19,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 18 minutes, 46 seconds)
2025-09-12 21:56:17,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:56:17,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:00:40,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2088.28174 ± 117.077
2025-09-12 22:00:40,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2191.6135, 2228.3743, 1936.5101, 2046.7217, 2142.2646, 1852.1553, 2119.2573, 2226.7861, 2034.2693, 2104.865]
2025-09-12 22:00:40,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:00:40,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2088.28) for latency MM1Queue_a033_s075
2025-09-12 22:00:40,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 3 minutes, 41 seconds)
2025-09-12 22:11:36,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:11:36,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:16:00,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2087.30615 ± 347.021
2025-09-12 22:16:00,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2019.4879, 2389.6675, 2140.551, 1116.7885, 2277.679, 2297.27, 2277.8215, 2229.4622, 1949.5227, 2174.8098]
2025-09-12 22:16:00,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:16:00,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 48 minutes, 35 seconds)
2025-09-12 22:26:54,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:26:54,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:31:15,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2146.58057 ± 101.716
2025-09-12 22:31:15,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2179.744, 2110.5225, 2285.2822, 2088.4307, 2023.0416, 2224.7954, 2030.9514, 2285.9448, 2012.0153, 2225.078]
2025-09-12 22:31:15,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:31:15,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2146.58) for latency MM1Queue_a033_s075
2025-09-12 22:31:15,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 33 minutes, 16 seconds)
2025-09-12 22:42:09,560 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:42:09,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:46:33,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1922.59497 ± 362.427
2025-09-12 22:46:33,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2281.3203, 2083.1584, 2054.486, 1951.681, 888.077, 2032.5782, 1934.9292, 1849.5073, 2132.0635, 2018.1492]
2025-09-12 22:46:33,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:46:33,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 17 minutes, 55 seconds)
2025-09-12 22:57:28,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:57:28,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:01:52,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1984.36890 ± 491.406
2025-09-12 23:01:52,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2053.3013, 2155.3901, 2152.7058, 2308.0513, 1922.3088, 579.0849, 1950.3508, 2443.7898, 2205.4934, 2073.2136]
2025-09-12 23:01:52,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:01:52,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 2 minutes, 28 seconds)
2025-09-12 23:12:48,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:12:48,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:17:14,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2054.90967 ± 148.161
2025-09-12 23:17:14,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1917.1029, 2076.3147, 1988.1122, 1883.3519, 2384.9756, 2175.168, 1889.2314, 2132.0615, 2119.7346, 1983.045]
2025-09-12 23:17:14,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:17:14,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 47 minutes, 10 seconds)
2025-09-12 23:28:08,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:28:08,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:32:32,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2243.67505 ± 122.992
2025-09-12 23:32:32,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2095.602, 2268.0056, 2511.558, 2177.1572, 2267.5544, 2347.5508, 2189.2983, 2335.8445, 2131.8176, 2112.3608]
2025-09-12 23:32:32,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:32:32,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1226 [INFO]: New best (2243.68) for latency MM1Queue_a033_s075
2025-09-12 23:32:32,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 31 minutes, 49 seconds)
2025-09-12 23:43:27,351 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:43:27,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:47:51,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2145.73779 ± 125.455
2025-09-12 23:47:51,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2168.2258, 1953.7504, 2040.3568, 2267.6, 1986.7094, 2331.9214, 2271.0942, 2234.1897, 2043.8485, 2159.6829]
2025-09-12 23:47:51,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:47:51,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 16 minutes, 35 seconds)
2025-09-12 23:58:45,952 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:58:45,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:03:09,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2146.73804 ± 125.887
2025-09-13 00:03:09,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2041.7833, 2093.3408, 2281.586, 2043.1549, 2074.157, 2363.5803, 2156.185, 2231.0508, 1933.0688, 2249.474]
2025-09-13 00:03:09,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:03:09,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 1 minute, 16 seconds)
2025-09-13 00:14:02,309 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:14:02,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:18:21,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1990.28284 ± 272.676
2025-09-13 00:18:21,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2169.818, 1771.28, 1884.5641, 2225.0232, 2084.8518, 1953.8524, 2319.022, 1356.0492, 2248.0715, 1890.2953]
2025-09-13 00:18:21,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:18:21,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 45 minutes, 53 seconds)
2025-09-13 00:29:15,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:29:15,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:33:37,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2133.07520 ± 147.775
2025-09-13 00:33:37,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2249.3394, 2398.9407, 2216.674, 1930.2263, 2168.818, 1881.233, 2098.3813, 2013.3652, 2184.637, 2189.136]
2025-09-13 00:33:37,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:33:37,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 33 seconds)
2025-09-13 00:44:30,593 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:44:30,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:48:54,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1971.57385 ± 615.845
2025-09-13 00:48:54,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2214.1396, 2110.6995, 268.27032, 2241.3887, 2473.5386, 2258.8997, 2249.646, 1506.4928, 2121.3735, 2271.29]
2025-09-13 00:48:54,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:48:54,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 16 seconds)
2025-09-13 00:59:50,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:59:50,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:04:11,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 2075.18921 ± 170.858
2025-09-13 01:04:11,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [2046.7065, 2004.6826, 2323.5962, 1915.4271, 2172.9465, 1997.194, 2314.7646, 2114.3606, 1727.608, 2134.608]
2025-09-13 01:04:11,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:04:11,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-halfcheetah):1251 [DEBUG]: Training session finished
