2025-09-11 22:56:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc0-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 22:56:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc0-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 22:56:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x145674da1350>}
2025-09-11 22:56:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 22:56:19,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 22:56:19,148 baseline-mbpac-noiseperc0-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 22:56:19,148 baseline-mbpac-noiseperc0-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 22:56:19,156 baseline-mbpac-noiseperc0-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 22:56:20,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 22:56:20,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 23:06:56,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:06:56,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:11:26,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -244.40793 ± 40.520
2025-09-11 23:11:26,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-164.45314, -239.60075, -263.4813, -230.73904, -211.80411, -242.2751, -274.81436, -220.20808, -321.31277, -275.3907]
2025-09-11 23:11:26,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:11:26,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (-244.41) for latency MM1Queue_a033_s075
2025-09-11 23:11:26,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 24 hours, 55 minutes, 34 seconds)
2025-09-11 23:23:23,546 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:23:23,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:27:57,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -231.19669 ± 104.486
2025-09-11 23:27:57,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-319.4831, -318.55963, -79.61631, -151.22954, -129.0828, -159.22383, -178.60576, -368.16223, -395.21765, -212.78609]
2025-09-11 23:27:57,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:27:57,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (-231.20) for latency MM1Queue_a033_s075
2025-09-11 23:27:57,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 25 hours, 49 minutes, 55 seconds)
2025-09-11 23:39:42,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:39:42,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:44:15,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 750.33087 ± 197.048
2025-09-11 23:44:15,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [729.1886, 795.09625, 739.687, 830.6863, 942.5592, 808.736, 1024.5515, 256.03238, 611.9291, 764.84235]
2025-09-11 23:44:15,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:44:15,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (750.33) for latency MM1Queue_a033_s075
2025-09-11 23:44:15,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 49 minutes, 38 seconds)
2025-09-11 23:56:02,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:56:02,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:00:33,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2301.23779 ± 118.196
2025-09-12 00:00:33,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2509.5686, 2133.254, 2388.3809, 2239.8503, 2339.14, 2315.1018, 2090.8716, 2312.6882, 2402.6758, 2280.8481]
2025-09-12 00:00:33,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:00:33,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2301.24) for latency MM1Queue_a033_s075
2025-09-12 00:00:33,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 41 minutes, 17 seconds)
2025-09-12 00:12:25,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:12:25,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:18:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1694.01929 ± 965.595
2025-09-12 00:18:02,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2606.8335, 2833.695, 438.08405, 2810.1729, 1310.0253, 1212.5635, 1255.2039, 2236.5518, -73.09365, 2310.1577]
2025-09-12 00:18:02,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:18:03,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 52 minutes, 36 seconds)
2025-09-12 00:29:59,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:29:59,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:34:33,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2285.65674 ± 867.261
2025-09-12 00:34:33,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1468.1421, 2027.7921, 3256.2607, 894.31854, 3105.6226, 3044.7063, 1295.7876, 3541.3298, 2257.6772, 1964.9302]
2025-09-12 00:34:33,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:34:33,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 26 hours, 2 minutes, 43 seconds)
2025-09-12 00:46:32,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:46:32,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:51:03,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2871.08643 ± 952.570
2025-09-12 00:51:03,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3640.5952, 1516.2968, 3061.65, 715.8522, 3272.257, 3831.4165, 3380.6162, 3240.052, 3494.0286, 2558.099]
2025-09-12 00:51:03,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:51:03,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2871.09) for latency MM1Queue_a033_s075
2025-09-12 00:51:03,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 25 hours, 45 minutes, 23 seconds)
2025-09-12 01:02:39,521 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:02:39,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:07:15,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3204.96655 ± 978.490
2025-09-12 01:07:15,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3821.8428, 3201.1367, 3788.4463, 3570.822, 3569.2659, 327.2629, 3615.7979, 3495.3867, 3433.2212, 3226.4844]
2025-09-12 01:07:15,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:07:15,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3204.97) for latency MM1Queue_a033_s075
2025-09-12 01:07:15,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 25 hours, 27 minutes, 3 seconds)
2025-09-12 01:19:08,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:19:08,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:23:42,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3715.02466 ± 125.036
2025-09-12 01:23:42,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3688.6323, 3604.6946, 3692.4512, 3988.3367, 3901.61, 3607.0125, 3727.8154, 3596.8513, 3715.2913, 3627.5525]
2025-09-12 01:23:42,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:23:42,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3715.02) for latency MM1Queue_a033_s075
2025-09-12 01:23:42,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 25 hours, 13 minutes, 24 seconds)
2025-09-12 01:35:25,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:35:25,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:39:58,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3554.36914 ± 1009.162
2025-09-12 01:39:58,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3894.6357, 3672.392, 3918.3064, 543.67395, 4083.5947, 3936.052, 3906.9219, 3888.9705, 3746.2773, 3952.8655]
2025-09-12 01:39:58,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:39:58,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 34 minutes, 38 seconds)
2025-09-12 01:51:47,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:51:47,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:56:18,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3604.50659 ± 1154.525
2025-09-12 01:56:18,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4185.0845, 4085.017, 4157.47, 3934.8, 4092.8677, 3798.6213, 194.27354, 3614.7363, 3725.031, 4257.1646]
2025-09-12 01:56:18,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:56:18,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 24 hours, 14 minutes, 58 seconds)
2025-09-12 02:08:02,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:08:02,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:12:30,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4141.83301 ± 177.677
2025-09-12 02:12:30,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4342.7173, 4302.0923, 4103.07, 3993.5405, 3929.6084, 4232.537, 4441.185, 4071.1401, 4140.3936, 3862.042]
2025-09-12 02:12:30,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:12:30,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4141.83) for latency MM1Queue_a033_s075
2025-09-12 02:12:30,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 53 minutes, 32 seconds)
2025-09-12 02:23:51,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:23:51,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:28:16,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3331.91553 ± 1400.460
2025-09-12 02:28:16,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4341.2314, 1297.1229, 4164.895, 751.346, 4702.421, 1689.598, 4295.5703, 4285.8066, 4055.457, 3735.7068]
2025-09-12 02:28:16,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:28:16,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 29 minutes, 47 seconds)
2025-09-12 02:39:45,074 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:39:45,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:45:10,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3872.40894 ± 690.059
2025-09-12 02:45:10,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4323.5547, 4069.667, 4039.5522, 1852.1285, 4009.6458, 4378.9717, 4213.5005, 3933.3213, 3922.059, 3981.6873]
2025-09-12 02:45:10,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:45:10,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 23 hours, 21 minutes, 7 seconds)
2025-09-12 02:56:49,622 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:56:49,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:01:14,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3620.84253 ± 1270.658
2025-09-12 03:01:14,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [771.2423, 4384.28, 3895.3267, 4177.175, 4488.4473, 3637.605, 4582.5728, 1570.2838, 4584.8096, 4116.682]
2025-09-12 03:01:14,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:01:14,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 23 hours, 1 minute, 25 seconds)
2025-09-12 03:12:57,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:12:57,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:17:20,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4075.02271 ± 1345.639
2025-09-12 03:17:20,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4603.4204, 4483.7915, 4663.503, 3971.2883, 4385.609, 4684.799, 4829.179, 4391.3833, 94.49262, 4642.759]
2025-09-12 03:17:20,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:17:20,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 41 minutes, 17 seconds)
2025-09-12 03:29:10,011 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:29:10,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:33:36,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4401.07031 ± 177.190
2025-09-12 03:33:36,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4595.9697, 4585.0483, 4534.227, 4502.0786, 4161.6294, 4528.562, 4280.988, 4161.8726, 4158.526, 4501.801]
2025-09-12 03:33:36,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:33:36,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4401.07) for latency MM1Queue_a033_s075
2025-09-12 03:33:36,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 26 minutes, 22 seconds)
2025-09-12 03:45:17,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:45:17,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:50:44,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3552.25537 ± 1829.806
2025-09-12 03:50:44,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4582.1685, 4581.4727, 116.59154, 4733.8564, 3153.4, 5125.4287, 4455.992, 4309.423, -75.74156, 4539.9663]
2025-09-12 03:50:44,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:50:44,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 32 minutes, 19 seconds)
2025-09-12 04:02:46,510 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:02:46,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:13,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4442.64355 ± 171.265
2025-09-12 04:07:13,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4352.0874, 4615.0063, 4284.812, 4593.323, 4556.014, 4156.815, 4309.961, 4316.924, 4530.219, 4711.2734]
2025-09-12 04:07:13,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:07:13,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4442.64) for latency MM1Queue_a033_s075
2025-09-12 04:07:13,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 22 hours, 9 minutes, 13 seconds)
2025-09-12 04:18:41,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:18:41,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:23:07,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4493.73438 ± 263.527
2025-09-12 04:23:07,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4134.909, 4568.764, 4032.3318, 4589.4985, 4753.241, 4850.9106, 4594.917, 4623.809, 4613.843, 4175.117]
2025-09-12 04:23:07,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:23:07,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4493.73) for latency MM1Queue_a033_s075
2025-09-12 04:23:07,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 50 minutes, 8 seconds)
2025-09-12 04:34:57,078 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:34:57,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:39:25,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4252.00928 ± 724.889
2025-09-12 04:39:25,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4139.5195, 4611.289, 4364.0483, 4265.1997, 4361.834, 4386.5005, 2223.8662, 5061.8154, 4308.3906, 4797.6323]
2025-09-12 04:39:25,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:39:25,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 36 minutes, 55 seconds)
2025-09-12 04:50:59,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:50:59,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:55:26,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3996.16748 ± 1495.171
2025-09-12 04:55:26,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4607.686, 4819.988, 4983.5903, 4628.8677, 4711.824, 795.43036, 4962.954, 1292.2478, 4928.8477, 4230.2397]
2025-09-12 04:55:26,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:55:26,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 16 minutes, 38 seconds)
2025-09-12 05:07:05,596 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:07:05,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:11:30,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3946.06104 ± 1695.291
2025-09-12 05:11:30,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [540.5746, 4709.9097, 4587.6865, 5037.082, 609.7454, 5139.7, 4848.269, 4555.7925, 4839.1885, 4592.6587]
2025-09-12 05:11:30,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:11:30,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 43 minutes, 56 seconds)
2025-09-12 05:23:03,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:23:03,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:27:29,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4777.97070 ± 186.567
2025-09-12 05:27:29,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5053.5835, 4534.115, 4707.534, 5017.2056, 4833.483, 4803.4653, 4415.594, 4837.1743, 4865.1367, 4712.4155]
2025-09-12 05:27:29,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:27:29,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4777.97) for latency MM1Queue_a033_s075
2025-09-12 05:27:29,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 20 minutes, 5 seconds)
2025-09-12 05:39:22,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:39:22,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:43:48,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3876.60205 ± 1535.827
2025-09-12 05:43:48,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4889.2583, 4574.488, 4349.03, 4524.8657, 4620.758, 4692.1694, 4665.0283, 1727.3773, 83.249245, 4639.792]
2025-09-12 05:43:48,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:43:48,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 10 minutes, 21 seconds)
2025-09-12 05:55:24,537 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:55:24,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:00:52,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4271.82275 ± 1172.863
2025-09-12 06:00:52,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4848.212, 4791.259, 4804.587, 799.8563, 4416.3354, 4887.1177, 4468.7744, 4337.0684, 4548.69, 4816.323]
2025-09-12 06:00:52,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:00:52,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 5 minutes, 31 seconds)
2025-09-12 06:12:45,580 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:12:45,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:18:12,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4724.19824 ± 306.442
2025-09-12 06:18:12,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4204.9673, 4873.728, 5209.091, 4865.0586, 4690.2305, 4338.4873, 4986.835, 5005.5654, 4638.2476, 4429.7793]
2025-09-12 06:18:12,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:18:12,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 20 hours, 8 minutes, 26 seconds)
2025-09-12 06:29:55,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:29:55,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:34:17,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4585.46289 ± 466.046
2025-09-12 06:34:17,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5166.3335, 4808.8706, 3675.9624, 4856.4746, 3751.3801, 4513.939, 4600.9233, 4827.0166, 4763.311, 4890.417]
2025-09-12 06:34:17,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:34:17,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 52 minutes, 4 seconds)
2025-09-12 06:45:42,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:45:42,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:51:07,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4339.74512 ± 1164.349
2025-09-12 06:51:07,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4629.0303, 4713.7783, 4792.971, 4814.553, 4697.9897, 4707.75, 4714.6772, 4579.191, 4891.5166, 855.99335]
2025-09-12 06:51:07,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:51:07,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 47 minutes, 34 seconds)
2025-09-12 07:02:38,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:02:38,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:08:04,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4792.88916 ± 234.302
2025-09-12 07:08:04,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4493.0044, 4700.8804, 4657.58, 5340.404, 5060.663, 4834.6216, 4814.9766, 4660.3633, 4775.2305, 4591.1694]
2025-09-12 07:08:04,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:08:04,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4792.89) for latency MM1Queue_a033_s075
2025-09-12 07:08:04,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 39 minutes, 45 seconds)
2025-09-12 07:19:44,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:19:44,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:24:10,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4727.79883 ± 300.495
2025-09-12 07:24:10,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4380.6313, 4430.63, 5043.1255, 5016.7007, 4389.9473, 4320.3403, 4717.2676, 4924.958, 4947.1377, 5107.2485]
2025-09-12 07:24:10,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:24:10,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 19 hours, 9 minutes, 35 seconds)
2025-09-12 07:35:50,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:35:50,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:41:17,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4982.63965 ± 268.665
2025-09-12 07:41:17,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5143.328, 4761.7876, 4742.317, 5493.8193, 5044.2827, 5205.402, 5036.074, 5106.4673, 4522.955, 4769.968]
2025-09-12 07:41:17,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:41:17,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4982.64) for latency MM1Queue_a033_s075
2025-09-12 07:41:17,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 49 minutes, 46 seconds)
2025-09-12 07:52:54,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:52:54,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:57:22,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4003.02002 ± 1707.726
2025-09-12 07:57:22,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4730.52, 5108.2812, 4199.79, 837.36053, 4620.846, 4765.37, 4922.6025, 5320.825, 5078.3804, 446.2264]
2025-09-12 07:57:22,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:57:22,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 33 minutes, 19 seconds)
2025-09-12 08:08:55,972 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:08:55,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:14:22,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4435.70020 ± 920.056
2025-09-12 08:14:22,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4871.177, 4491.9966, 1840.8795, 4918.5186, 5055.0093, 4844.2925, 3896.7632, 4672.2017, 4816.5996, 4949.563]
2025-09-12 08:14:22,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:14:22,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 19 minutes)
2025-09-12 08:26:04,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:26:04,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:30:31,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4959.91309 ± 306.373
2025-09-12 08:30:31,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4761.7256, 5129.9326, 5446.2695, 5434.629, 4908.2305, 4741.847, 4390.2314, 4845.8203, 5071.641, 4868.7993]
2025-09-12 08:30:31,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:30:31,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 51 minutes, 50 seconds)
2025-09-12 08:42:10,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:42:10,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:46:36,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4369.54053 ± 1666.512
2025-09-12 08:46:36,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5435.94, 4955.552, 4933.1924, 2633.7808, -77.37896, 5210.669, 5114.311, 5302.927, 5145.961, 5040.451]
2025-09-12 08:46:36,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:46:36,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 35 minutes, 13 seconds)
2025-09-12 08:58:06,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:58:06,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:03:34,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3972.24854 ± 1855.725
2025-09-12 09:03:34,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [842.5763, 3111.5073, 5640.7627, 2979.7983, 5372.253, 5633.2437, 5116.8843, 647.96246, 5404.143, 4973.352]
2025-09-12 09:03:34,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:03:34,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 16 minutes, 55 seconds)
2025-09-12 09:15:12,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:15:12,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:19:41,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5096.72217 ± 185.415
2025-09-12 09:19:41,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5067.13, 5160.69, 5243.379, 5332.726, 5379.4585, 5081.777, 4808.649, 4844.4233, 4914.0986, 5134.8906]
2025-09-12 09:19:41,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:19:41,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5096.72) for latency MM1Queue_a033_s075
2025-09-12 09:19:41,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 39 seconds)
2025-09-12 09:31:20,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:31:20,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:36:49,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5074.11084 ± 263.567
2025-09-12 09:36:49,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5442.364, 4770.4062, 5171.182, 4577.0015, 5033.9146, 4864.11, 5360.434, 5058.0703, 5359.4697, 5104.158]
2025-09-12 09:36:49,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:36:49,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 45 minutes, 45 seconds)
2025-09-12 09:48:53,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:48:53,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:53:19,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4332.73682 ± 1121.140
2025-09-12 09:53:19,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2655.6855, 4976.0474, 4609.1006, 4961.8184, 2839.0427, 5334.709, 2462.9407, 4942.475, 5292.0786, 5253.4683]
2025-09-12 09:53:19,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:53:19,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 16 hours, 33 minutes, 30 seconds)
2025-09-12 10:04:50,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:04:50,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:10:18,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4418.93262 ± 1365.898
2025-09-12 10:10:18,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4995.7183, 4880.8145, 4201.2334, 6025.8564, 3433.6897, 5392.7617, 4498.3906, 4559.1226, 5343.809, 857.9295]
2025-09-12 10:10:18,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:10:18,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 27 minutes, 29 seconds)
2025-09-12 10:21:51,864 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:21:51,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:26:18,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5083.36328 ± 672.799
2025-09-12 10:26:18,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5463.168, 5234.668, 5508.7983, 3468.9358, 5457.7163, 5817.264, 5284.837, 5173.3574, 5246.0483, 4178.8413]
2025-09-12 10:26:18,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:26:18,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 59 minutes, 41 seconds)
2025-09-12 10:37:47,708 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:37:47,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:42:13,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4838.38574 ± 1335.022
2025-09-12 10:42:13,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4787.3853, 930.1836, 4878.5815, 5247.65, 5509.6523, 5606.3745, 5210.5557, 5469.4316, 5711.4395, 5032.5977]
2025-09-12 10:42:13,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:42:13,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 40 minutes, 55 seconds)
2025-09-12 10:53:45,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:53:45,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:58:09,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4649.42480 ± 1263.704
2025-09-12 10:58:09,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [963.6464, 4982.858, 5592.662, 4582.3833, 4934.9834, 4793.993, 5041.216, 5178.227, 5536.391, 4887.891]
2025-09-12 10:58:09,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:58:09,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 10 minutes, 55 seconds)
2025-09-12 11:09:34,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:09:34,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:13:58,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5179.38477 ± 329.385
2025-09-12 11:13:58,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5425.1973, 5300.7056, 5067.2437, 4706.107, 5420.882, 5169.678, 5391.77, 5549.5015, 4460.0044, 5302.7563]
2025-09-12 11:13:58,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:13:58,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5179.38) for latency MM1Queue_a033_s075
2025-09-12 11:13:58,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 47 minutes, 11 seconds)
2025-09-12 11:25:31,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:25:31,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:30:57,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4794.60840 ± 1267.757
2025-09-12 11:30:57,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5402.52, 1611.8657, 5178.0835, 5939.707, 5830.1187, 4692.3584, 3358.064, 5644.255, 5120.741, 5168.374]
2025-09-12 11:30:57,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:30:57,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 31 minutes, 8 seconds)
2025-09-12 11:42:38,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:42:38,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:48:05,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4804.28076 ± 1096.017
2025-09-12 11:48:05,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5698.8057, 5520.846, 4681.6416, 5193.8994, 1826.584, 5159.1704, 5327.428, 3978.681, 5461.709, 5194.0415]
2025-09-12 11:48:05,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:48:05,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 26 minutes, 53 seconds)
2025-09-12 12:00:01,234 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:00:01,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:04:27,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4274.99707 ± 2150.533
2025-09-12 12:04:27,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5889.823, 5779.334, 5230.537, 948.8509, 5673.7637, 5443.5547, 2457.8403, 5754.6597, -56.029, 5627.631]
2025-09-12 12:04:27,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:04:27,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 15 minutes, 14 seconds)
2025-09-12 12:16:01,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:16:01,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:20:26,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5225.39893 ± 938.062
2025-09-12 12:20:26,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5151.604, 6076.246, 5283.3496, 5719.8716, 5189.185, 5563.328, 5429.4766, 2525.9133, 5705.145, 5609.8726]
2025-09-12 12:20:26,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:20:26,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5225.40) for latency MM1Queue_a033_s075
2025-09-12 12:20:26,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 59 minutes, 26 seconds)
2025-09-12 12:32:13,471 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:32:13,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:36:39,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5166.39160 ± 1326.959
2025-09-12 12:36:39,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5631.098, 5149.7524, 5133.4756, 5864.027, 1256.9376, 5792.776, 5801.5903, 5537.658, 5679.5063, 5817.092]
2025-09-12 12:36:39,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:36:39,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 46 minutes, 45 seconds)
2025-09-12 12:48:05,023 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:48:05,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:52:31,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4562.31787 ± 1993.225
2025-09-12 12:52:31,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5490.7827, 5506.407, 5432.961, 5147.988, 5565.005, 749.82855, 5631.362, 5617.0386, 6032.589, 449.21298]
2025-09-12 12:52:31,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:52:31,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 19 minutes, 19 seconds)
2025-09-12 13:03:59,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:03:59,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:08:26,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4989.85742 ± 1002.456
2025-09-12 13:08:26,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5086.9517, 5633.2173, 5731.1675, 2295.5386, 5375.328, 5609.5786, 5317.5747, 4068.6343, 5328.81, 5451.7734]
2025-09-12 13:08:26,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:08:26,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 51 minutes, 19 seconds)
2025-09-12 13:19:46,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:19:46,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:25:12,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4965.84375 ± 941.540
2025-09-12 13:25:12,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5575.139, 5843.3184, 5137.361, 5639.302, 4028.5208, 5522.1016, 2563.7139, 5479.0625, 4842.0103, 5027.911]
2025-09-12 13:25:12,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:25:12,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 39 minutes, 5 seconds)
2025-09-12 13:36:55,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:36:55,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:41:22,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4984.74951 ± 1253.747
2025-09-12 13:41:22,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5483.571, 6072.9453, 5451.423, 5390.8994, 5623.752, 5562.979, 4828.208, 5039.9365, 5032.6504, 1361.1377]
2025-09-12 13:41:22,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:41:22,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 24 minutes, 29 seconds)
2025-09-12 13:53:01,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:53:01,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:57:28,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5564.09863 ± 867.071
2025-09-12 13:57:28,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5223.9863, 6467.6475, 6213.403, 3272.1238, 5957.6685, 6074.789, 5694.4316, 5248.527, 5347.873, 6140.531]
2025-09-12 13:57:28,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:57:28,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5564.10) for latency MM1Queue_a033_s075
2025-09-12 13:57:28,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 7 minutes, 28 seconds)
2025-09-12 14:09:05,985 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:09:05,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:13:32,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5153.56104 ± 996.745
2025-09-12 14:13:32,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3788.2095, 5308.6245, 5661.932, 5401.8774, 2762.8022, 5487.667, 5739.3857, 6056.5347, 5995.2573, 5333.321]
2025-09-12 14:13:32,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:13:32,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 52 minutes, 55 seconds)
2025-09-12 14:25:10,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:25:10,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:30:36,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4373.14502 ± 1810.629
2025-09-12 14:30:36,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5370.698, 740.7154, 870.6409, 4646.906, 5741.264, 5169.4673, 4894.529, 5619.831, 5504.6743, 5172.7227]
2025-09-12 14:30:36,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:30:36,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 46 minutes, 41 seconds)
2025-09-12 14:42:15,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:42:15,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:46:38,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5049.33838 ± 1759.332
2025-09-12 14:46:38,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5507.0376, 5279.624, 5989.069, 5576.609, 5321.7007, 5717.615, 5569.802, 5529.0557, 6173.124, -170.25735]
2025-09-12 14:46:38,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:46:38,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 24 minutes, 3 seconds)
2025-09-12 14:58:18,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:58:18,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:02:41,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5397.10889 ± 174.734
2025-09-12 15:02:41,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5595.1035, 5266.4497, 5499.98, 5266.1665, 5265.796, 5310.7837, 5808.79, 5260.4565, 5305.212, 5392.354]
2025-09-12 15:02:41,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:02:41,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 6 minutes, 50 seconds)
2025-09-12 15:14:07,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:14:07,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:18:31,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4924.50830 ± 1668.156
2025-09-12 15:18:31,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5948.425, 5968.9194, 446.36026, 3474.1814, 5693.745, 4831.9478, 6077.3813, 5218.5825, 5853.416, 5732.1216]
2025-09-12 15:18:31,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:18:31,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 48 minutes, 22 seconds)
2025-09-12 15:30:08,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:30:08,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:34:30,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5278.59570 ± 328.513
2025-09-12 15:34:30,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5304.4854, 5025.07, 5065.426, 6024.03, 4820.167, 5308.709, 5221.69, 5026.54, 5350.489, 5639.35]
2025-09-12 15:34:30,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:34:30,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 31 minutes, 35 seconds)
2025-09-12 15:45:58,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:45:58,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:50:20,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5492.62793 ± 897.954
2025-09-12 15:50:20,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6091.0317, 5273.859, 6045.5107, 2935.1416, 5436.642, 5692.5796, 5864.221, 5919.2593, 6161.549, 5506.486]
2025-09-12 15:50:20,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:50:20,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 5 minutes, 56 seconds)
2025-09-12 16:02:00,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:02:00,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:06:23,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5717.17188 ± 693.286
2025-09-12 16:06:23,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6105.6694, 5842.98, 3889.7168, 6338.8423, 5248.384, 6124.0923, 5958.0464, 5493.586, 6358.494, 5811.908]
2025-09-12 16:06:23,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:06:23,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5717.17) for latency MM1Queue_a033_s075
2025-09-12 16:06:23,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 50 minutes, 8 seconds)
2025-09-12 16:17:48,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:17:48,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:22:14,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5316.50537 ± 1348.386
2025-09-12 16:22:14,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6014.6904, 5813.3003, 6083.9014, 4981.839, 6048.7236, 1473.2463, 6073.6787, 4960.218, 6134.673, 5580.785]
2025-09-12 16:22:14,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:22:14,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 32 minutes, 43 seconds)
2025-09-12 16:33:44,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:33:44,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:38:10,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4964.85254 ± 1148.565
2025-09-12 16:38:10,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3163.205, 5924.71, 2857.1506, 4836.9785, 6062.0215, 5358.2676, 5840.747, 5692.2607, 3979.099, 5934.086]
2025-09-12 16:38:10,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:38:10,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 17 minutes, 32 seconds)
2025-09-12 16:49:38,971 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:49:38,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:54:03,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5109.59668 ± 1760.234
2025-09-12 16:54:03,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6271.9, 2310.7217, 5714.1724, 5764.038, 5595.5176, 6364.2173, 1065.6969, 6368.625, 5469.5063, 6171.572]
2025-09-12 16:54:03,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:54:03,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 53 seconds)
2025-09-12 17:05:18,545 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:05:18,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:09:46,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5641.87695 ± 978.349
2025-09-12 17:09:46,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5922.9966, 6000.776, 5229.9634, 6109.8364, 5822.947, 5694.9624, 6397.873, 6137.984, 2856.514, 6244.919]
2025-09-12 17:09:46,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:09:46,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 44 minutes, 11 seconds)
2025-09-12 17:21:10,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:21:10,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:25:37,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5790.21826 ± 298.703
2025-09-12 17:25:37,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5648.775, 5940.086, 5660.97, 6249.5474, 6035.5737, 6163.519, 5744.0537, 5322.8813, 5340.8203, 5795.958]
2025-09-12 17:25:37,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:25:37,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5790.22) for latency MM1Queue_a033_s075
2025-09-12 17:25:37,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 27 minutes, 1 second)
2025-09-12 17:37:07,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:37:07,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:41:33,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5278.72119 ± 1102.253
2025-09-12 17:41:33,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5758.1665, 5603.676, 5723.3145, 5577.272, 5660.101, 5945.692, 2400.7512, 5633.203, 4118.669, 6366.369]
2025-09-12 17:41:33,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:41:33,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 11 minutes, 49 seconds)
2025-09-12 17:52:52,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:52:52,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:57:18,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5107.22168 ± 1395.964
2025-09-12 17:57:18,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5571.1895, 2087.3533, 5341.1133, 5768.9937, 5846.4727, 5701.9023, 2635.9673, 6062.61, 6174.6367, 5881.98]
2025-09-12 17:57:18,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:57:18,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 54 minutes, 49 seconds)
2025-09-12 18:08:50,259 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:08:50,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:13:15,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5436.45557 ± 883.353
2025-09-12 18:13:15,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5487.547, 5811.352, 4998.9707, 5753.0146, 5824.2617, 6523.122, 5812.912, 4554.8975, 3331.898, 6266.58]
2025-09-12 18:13:15,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:13:15,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 39 minutes, 19 seconds)
2025-09-12 18:24:50,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:24:50,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:29:18,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4825.28711 ± 1928.854
2025-09-12 18:29:18,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5397.819, 6246.5684, 5987.6206, 1754.2589, 5494.7573, 5971.1323, 5417.6606, 6065.165, 5553.0635, 364.82962]
2025-09-12 18:29:18,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:29:18,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 25 minutes, 24 seconds)
2025-09-12 18:40:35,448 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:40:35,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:45:02,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5727.56152 ± 1099.272
2025-09-12 18:45:02,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6945.4517, 5877.53, 5734.2017, 5620.716, 2795.128, 5673.053, 6396.016, 5598.9463, 5682.5713, 6952.002]
2025-09-12 18:45:02,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:45:02,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 8 minutes, 51 seconds)
2025-09-12 18:56:51,665 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:56:51,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:01:17,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 6108.97559 ± 294.373
2025-09-12 19:01:17,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6341.7144, 6021.787, 6033.4326, 6179.319, 6667.2188, 5921.915, 6058.6865, 6319.2217, 5477.1543, 6069.304]
2025-09-12 19:01:17,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:01:17,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (6108.98) for latency MM1Queue_a033_s075
2025-09-12 19:01:17,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 54 minutes, 35 seconds)
2025-09-12 19:12:53,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:12:53,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:17:21,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 6078.03320 ± 434.500
2025-09-12 19:17:21,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6680.6655, 6231.3315, 5201.49, 6447.7554, 6071.1646, 5807.0166, 6470.8647, 6136.3643, 5495.4937, 6238.1885]
2025-09-12 19:17:21,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:17:21,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 40 minutes, 11 seconds)
2025-09-12 19:29:04,282 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:29:04,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:33:31,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5604.15723 ± 1092.334
2025-09-12 19:33:31,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5875.8545, 5994.718, 5433.7637, 6469.8174, 2524.749, 6313.191, 6356.6787, 5462.347, 5434.798, 6175.655]
2025-09-12 19:33:31,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:33:31,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 25 minutes, 20 seconds)
2025-09-12 19:45:09,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:45:09,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:50:37,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5330.20117 ± 1563.544
2025-09-12 19:50:37,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5852.123, 5931.746, 5963.375, 5938.3936, 6363.694, 5304.484, 5619.4697, 5738.356, 702.888, 5887.4844]
2025-09-12 19:50:37,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:50:37,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 14 minutes, 4 seconds)
2025-09-12 20:02:09,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:02:09,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:07:34,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4172.68457 ± 2709.580
2025-09-12 20:07:34,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5596.716, 5480.5537, 18.542635, 75.11235, 6104.1055, 6383.667, 5758.388, 63.52179, 6177.803, 6068.435]
2025-09-12 20:07:34,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:07:34,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 3 minutes, 11 seconds)
2025-09-12 20:19:25,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:19:25,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:23:53,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5982.96143 ± 274.681
2025-09-12 20:23:53,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6141.7695, 6120.2676, 6204.9453, 6102.2803, 5588.4614, 5597.71, 6426.3555, 5775.427, 6158.484, 5713.9146]
2025-09-12 20:23:53,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:23:53,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 46 minutes, 55 seconds)
2025-09-12 20:35:39,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:35:39,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:41:08,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5615.43359 ± 1787.399
2025-09-12 20:41:08,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6848.9165, 5884.04, 5707.205, 6533.9946, 6134.5063, 5872.9727, 342.7211, 6108.7524, 6434.5283, 6286.701]
2025-09-12 20:41:08,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:41:08,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 35 minutes, 10 seconds)
2025-09-12 20:52:47,361 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:52:47,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:57:19,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 6309.62256 ± 356.301
2025-09-12 20:57:19,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6565.167, 5702.945, 6507.685, 6035.1777, 6330.468, 6727.418, 5809.6895, 6746.2666, 6106.948, 6564.459]
2025-09-12 20:57:19,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:57:19,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (6309.62) for latency MM1Queue_a033_s075
2025-09-12 20:57:19,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 18 minutes, 23 seconds)
2025-09-12 21:09:09,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:09:09,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:13:39,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5527.51562 ± 1404.339
2025-09-12 21:13:39,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1371.8241, 6086.524, 6547.966, 5887.6, 5642.5986, 5869.2524, 5866.409, 6100.975, 5816.795, 6085.211]
2025-09-12 21:13:39,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:13:39,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 58 minutes, 56 seconds)
2025-09-12 21:25:29,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:25:29,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:31:01,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5529.29346 ± 1587.202
2025-09-12 21:31:01,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6846.7607, 6545.7515, 1447.1367, 3902.921, 6108.397, 6535.6357, 6644.889, 6242.772, 5433.142, 5585.5264]
2025-09-12 21:31:01,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:31:01,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 43 minutes, 42 seconds)
2025-09-12 21:42:52,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:42:52,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:48:26,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5776.93066 ± 677.878
2025-09-12 21:48:26,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4160.204, 6094.491, 6274.0034, 6472.896, 6264.203, 6031.6284, 5694.1357, 6034.994, 4900.0034, 5842.749]
2025-09-12 21:48:26,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:48:26,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 30 minutes, 33 seconds)
2025-09-12 22:00:34,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:00:34,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:05:04,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5769.90088 ± 317.386
2025-09-12 22:05:04,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5749.7754, 5966.872, 5842.6245, 6104.6436, 5605.978, 5044.2593, 5850.754, 5804.6094, 5496.2124, 6233.2803]
2025-09-12 22:05:04,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:05:04,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 11 minutes, 46 seconds)
2025-09-12 22:17:04,309 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:17:04,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:21:49,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5322.43896 ± 942.142
2025-09-12 22:21:49,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5530.268, 5748.397, 5766.8125, 5318.3423, 5174.761, 5629.3545, 5673.7534, 5583.158, 6194.788, 2604.7551]
2025-09-12 22:21:49,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:21:49,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 56 minutes, 37 seconds)
2025-09-12 22:33:57,634 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:33:57,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:38:31,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5680.30566 ± 1371.360
2025-09-12 22:38:31,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6586.269, 6250.0244, 6535.3105, 5370.124, 6288.122, 1700.2699, 5909.707, 6307.3994, 5761.962, 6093.868]
2025-09-12 22:38:31,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:38:31,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 40 minutes, 38 seconds)
2025-09-12 22:50:21,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:50:21,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:54:50,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 6249.16895 ± 337.256
2025-09-12 22:54:50,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6210.502, 6401.3345, 5937.3765, 6649.7007, 6222.775, 6785.0933, 6375.9956, 5651.6387, 5841.3823, 6415.8896]
2025-09-12 22:54:50,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:54:50,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 21 minutes, 8 seconds)
2025-09-12 23:06:39,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:06:39,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:11:09,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5791.14697 ± 1143.269
2025-09-12 23:11:09,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6082.927, 6446.0166, 2677.7393, 6500.9565, 5062.6475, 6658.681, 6029.408, 6331.908, 6599.936, 5521.252]
2025-09-12 23:11:09,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:11:09,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 1 minute, 58 seconds)
2025-09-12 23:22:40,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:22:40,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:27:07,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4923.64844 ± 2069.725
2025-09-12 23:27:07,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6914.175, 6465.326, 6345.0117, 2627.313, 6194.242, 5698.772, 528.9282, 2919.9375, 6825.9614, 4716.819]
2025-09-12 23:27:07,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:27:07,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 44 minutes, 5 seconds)
2025-09-12 23:38:20,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:38:20,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:42:45,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5917.47168 ± 911.845
2025-09-12 23:42:45,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6330.015, 6494.3857, 4432.9917, 5908.015, 6920.001, 3968.8062, 6063.614, 6611.2036, 5981.6733, 6464.013]
2025-09-12 23:42:45,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:42:45,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 25 minutes, 40 seconds)
2025-09-12 23:54:01,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:54:01,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:58:29,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5948.11035 ± 1122.407
2025-09-12 23:58:29,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6559.606, 6695.2925, 6231.537, 6207.526, 6642.9785, 5992.667, 2679.6436, 6122.3413, 6504.411, 5845.1]
2025-09-12 23:58:29,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:58:29,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 7 minutes, 57 seconds)
2025-09-13 00:11:02,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:11:02,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:15:43,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5662.75488 ± 1649.849
2025-09-13 00:15:43,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5470.6597, 6700.0034, 929.4993, 6206.101, 6391.6016, 6783.5347, 5769.422, 6416.4775, 6628.202, 5332.0503]
2025-09-13 00:15:43,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:15:43,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 53 minutes, 14 seconds)
2025-09-13 00:28:16,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:28:16,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:32:54,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5364.01465 ± 1715.359
2025-09-13 00:32:54,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [429.87732, 6465.0454, 4858.883, 6175.9434, 5665.534, 6005.744, 6326.931, 6058.116, 6380.0825, 5273.9927]
2025-09-13 00:32:54,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:32:54,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 38 minutes, 5 seconds)
2025-09-13 00:45:28,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:45:28,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:50:08,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5982.83740 ± 1392.420
2025-09-13 00:50:08,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5951.8716, 6599.0005, 6357.7534, 7108.2104, 6138.2725, 6320.3936, 5896.6987, 2007.1062, 6200.8467, 7248.222]
2025-09-13 00:50:08,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:50:08,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 23 minutes, 1 second)
2025-09-13 01:02:42,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:02:42,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:07:21,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5035.78223 ± 2156.619
2025-09-13 01:07:21,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6617.0557, 6867.272, 1960.2592, 2787.0913, 885.4586, 6321.5586, 6747.211, 6638.119, 6356.764, 5177.027]
2025-09-13 01:07:21,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:07:21,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 7 minutes, 40 seconds)
2025-09-13 01:20:02,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:20:02,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:24:43,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5124.07422 ± 2196.801
2025-09-13 01:24:43,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6408.9243, 1517.4755, 6469.244, 5772.697, 5582.7075, 6447.5923, 6615.2266, 6598.8276, 153.44023, 5674.607]
2025-09-13 01:24:43,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:24:43,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 51 minutes, 44 seconds)
2025-09-13 01:37:25,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:37:25,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:42:04,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5665.48535 ± 1855.291
2025-09-13 01:42:04,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5920.6685, 6248.82, 6406.694, 5983.017, 135.88876, 6372.657, 6664.2314, 6176.193, 6492.624, 6254.0635]
2025-09-13 01:42:04,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:42:04,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 34 minutes, 32 seconds)
2025-09-13 01:54:47,825 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:54:47,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:00:31,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5341.16748 ± 1679.482
2025-09-13 02:00:31,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6119.5015, 5436.419, 843.29156, 5984.8467, 6427.5586, 6092.783, 3791.59, 6558.3257, 5738.951, 6418.4087]
2025-09-13 02:00:31,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:00:31,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 31 seconds)
2025-09-13 02:13:15,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:13:15,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:17:56,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5256.33594 ± 2332.856
2025-09-13 02:17:56,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5941.519, 6786.552, 6520.0674, 1104.7837, 5991.7974, 6487.8438, 6734.9556, 6404.0293, 170.91412, 6420.9]
2025-09-13 02:17:56,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:17:56,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1251 [DEBUG]: Training session finished
