2025-09-11 23:14:45,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc5-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:14:45,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc5-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:14:45,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14fb6cac5250>}
2025-09-11 23:14:45,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 23:14:45,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 23:14:45,077 baseline-mbpac-noiseperc5-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 23:14:45,077 baseline-mbpac-noiseperc5-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 23:14:45,086 baseline-mbpac-noiseperc5-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 23:14:45,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 23:14:45,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 23:25:26,197 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:25:26,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:29:52,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -373.68903 ± 35.697
2025-09-11 23:29:52,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-387.10654, -332.88885, -412.03568, -346.03058, -409.65613, -428.44397, -350.47397, -394.78925, -317.71738, -357.7482]
2025-09-11 23:29:52,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:29:52,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (-373.69) for latency MM1Queue_a033_s075
2025-09-11 23:29:52,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 24 hours, 55 minutes, 18 seconds)
2025-09-11 23:41:32,612 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:41:32,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-11 23:45:57,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -110.05721 ± 62.800
2025-09-11 23:45:57,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-101.54409, -45.782467, -227.41125, -148.63173, -205.36546, -98.907486, -66.28006, -23.521124, -109.971054, -73.15729]
2025-09-11 23:45:57,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:45:57,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (-110.06) for latency MM1Queue_a033_s075
2025-09-11 23:45:57,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 25 hours, 28 minutes, 21 seconds)
2025-09-11 23:57:35,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:57:35,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:02:01,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 408.39194 ± 70.402
2025-09-12 00:02:01,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [324.85568, 363.07654, 422.44427, 431.72357, 353.13483, 500.7352, 365.3809, 377.91208, 565.21423, 379.442]
2025-09-12 00:02:01,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:02:01,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (408.39) for latency MM1Queue_a033_s075
2025-09-12 00:02:01,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 28 minutes, 1 second)
2025-09-12 00:13:40,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:13:40,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:18:02,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1396.10913 ± 614.465
2025-09-12 00:18:02,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1937.0332, 1643.5508, 1845.9991, 201.24998, 627.61084, 1582.8218, 1586.4121, 1895.4005, 1979.814, 661.1995]
2025-09-12 00:18:02,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:18:02,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (1396.11) for latency MM1Queue_a033_s075
2025-09-12 00:18:02,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 18 minutes, 35 seconds)
2025-09-12 00:29:30,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:29:30,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:33:52,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1617.97070 ± 851.882
2025-09-12 00:33:52,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2340.7773, 2590.7554, 1166.181, 624.8604, 1333.0199, 645.57745, 292.44247, 2422.1572, 2384.3118, 2379.6243]
2025-09-12 00:33:52,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:33:52,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (1617.97) for latency MM1Queue_a033_s075
2025-09-12 00:33:52,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 2 minutes, 59 seconds)
2025-09-12 00:45:15,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:45:15,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:49:35,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2267.84595 ± 755.550
2025-09-12 00:49:35,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2591.011, 1482.8365, 1237.2123, 2448.5723, 810.01886, 3113.9204, 2760.9907, 2885.088, 2894.8037, 2454.0063]
2025-09-12 00:49:35,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:49:35,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2267.85) for latency MM1Queue_a033_s075
2025-09-12 00:49:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 58 minutes, 53 seconds)
2025-09-12 01:01:00,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:01:00,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:05:25,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2297.07886 ± 972.962
2025-09-12 01:05:25,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3134.2249, 804.9728, 3358.2976, 1193.2896, 2171.3203, 2830.668, 711.4759, 2602.5344, 2916.7178, 3247.2896]
2025-09-12 01:05:25,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:05:25,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2297.08) for latency MM1Queue_a033_s075
2025-09-12 01:05:25,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 38 minutes)
2025-09-12 01:16:56,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:16:56,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:21:18,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2753.47803 ± 995.697
2025-09-12 01:21:18,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3190.7117, 3339.8638, 3287.3323, 3112.0747, 253.84177, 2742.8865, 3261.7668, 1507.0433, 3325.4373, 3513.823]
2025-09-12 01:21:18,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:21:18,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (2753.48) for latency MM1Queue_a033_s075
2025-09-12 01:21:18,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 18 minutes, 50 seconds)
2025-09-12 01:33:06,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:33:06,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:37:29,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2657.24023 ± 1027.820
2025-09-12 01:37:29,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3428.2512, 3331.0474, 3267.1301, 3570.3193, 2086.818, 756.8906, 1622.2761, 1378.7305, 3659.7542, 3471.1865]
2025-09-12 01:37:29,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:37:29,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 6 minutes, 1 second)
2025-09-12 01:49:06,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:49:06,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:54:33,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2749.23633 ± 1090.764
2025-09-12 01:54:33,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [396.52, 3917.7688, 3177.8137, 2838.1116, 1945.132, 3570.6658, 1400.1064, 3811.8193, 3437.187, 2997.2373]
2025-09-12 01:54:33,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:54:33,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 12 minutes, 21 seconds)
2025-09-12 02:05:59,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:59,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:10:20,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3165.75244 ± 787.525
2025-09-12 02:10:20,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3653.8936, 1394.49, 3767.7588, 1856.5845, 3280.8909, 3607.7664, 3380.562, 3580.3108, 3555.0283, 3580.2385]
2025-09-12 02:10:20,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:10:20,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3165.75) for latency MM1Queue_a033_s075
2025-09-12 02:10:20,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 57 minutes, 7 seconds)
2025-09-12 02:21:54,508 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:21:54,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:26:19,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3362.67432 ± 1026.049
2025-09-12 02:26:19,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3920.9412, 3746.6777, 348.59982, 3485.8381, 3581.038, 3860.2473, 3366.071, 3494.0676, 3747.9321, 4075.3298]
2025-09-12 02:26:19,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:26:19,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3362.67) for latency MM1Queue_a033_s075
2025-09-12 02:26:19,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 43 minutes, 47 seconds)
2025-09-12 02:37:47,121 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:37:47,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:42:07,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3668.71045 ± 713.703
2025-09-12 02:42:07,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3946.5984, 4154.838, 2891.059, 3992.2478, 3866.8738, 4104.5244, 3911.0886, 4005.5813, 4027.2732, 1787.0183]
2025-09-12 02:42:07,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:42:07,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3668.71) for latency MM1Queue_a033_s075
2025-09-12 02:42:07,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 26 minutes, 19 seconds)
2025-09-12 02:53:33,043 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:53:33,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:57:56,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3435.06567 ± 1074.801
2025-09-12 02:57:56,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [320.07486, 3900.7522, 3807.4016, 3089.6304, 3953.542, 3806.2625, 3586.1577, 3697.5828, 4041.7832, 4147.4688]
2025-09-12 02:57:56,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:57:56,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 23 hours, 3 minutes, 49 seconds)
2025-09-12 03:09:21,853 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:09:21,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:13:43,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3935.92969 ± 174.788
2025-09-12 03:13:43,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3958.641, 3916.648, 4037.4463, 3890.6448, 4076.5168, 3909.6506, 4207.108, 4081.3235, 3603.4949, 3677.8215]
2025-09-12 03:13:43,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:13:43,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (3935.93) for latency MM1Queue_a033_s075
2025-09-12 03:13:43,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 25 minutes, 51 seconds)
2025-09-12 03:25:15,179 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:25:15,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:29:37,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3807.09058 ± 876.709
2025-09-12 03:29:37,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4203.7383, 4228.931, 1196.5023, 4063.4163, 4025.376, 3956.5251, 4246.1196, 3941.7444, 4025.0925, 4183.4595]
2025-09-12 03:29:37,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:29:37,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 12 minutes, 6 seconds)
2025-09-12 03:41:01,039 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:41:01,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:45:22,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3818.13867 ± 736.042
2025-09-12 03:45:22,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3994.3945, 3912.6238, 4162.333, 3895.5757, 4344.0527, 3888.3086, 4247.7397, 4058.699, 1653.2731, 4024.3877]
2025-09-12 03:45:22,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:45:22,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 52 minutes, 21 seconds)
2025-09-12 03:56:55,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:56:55,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:02:26,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3733.03467 ± 922.525
2025-09-12 04:02:26,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4026.8252, 1027.0691, 4156.2246, 4275.8677, 4052.6223, 3921.3584, 3544.185, 4212.0903, 4145.0176, 3969.0906]
2025-09-12 04:02:26,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:02:26,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 57 minutes, 6 seconds)
2025-09-12 04:14:04,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:14:04,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:18:28,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3877.81299 ± 444.263
2025-09-12 04:18:28,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4002.2888, 3967.7185, 4075.1377, 3668.1506, 3895.392, 4283.261, 4254.758, 4015.6628, 2640.1277, 3975.632]
2025-09-12 04:18:28,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:18:28,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 44 minutes, 36 seconds)
2025-09-12 04:29:54,307 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:29:54,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:34:19,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3308.56836 ± 1183.397
2025-09-12 04:34:19,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3912.3152, 4190.414, 4435.142, 3793.743, 3851.0923, 2424.1646, 556.2104, 4081.65, 3870.922, 1970.0325]
2025-09-12 04:34:19,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:34:19,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 29 minutes, 29 seconds)
2025-09-12 04:45:57,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:45:57,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:50:22,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4133.94434 ± 161.509
2025-09-12 04:50:22,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4202.5835, 3859.2507, 4126.7275, 4226.697, 3891.5254, 4225.1177, 4235.718, 3976.679, 4387.8804, 4207.2646]
2025-09-12 04:50:22,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:50:22,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4133.94) for latency MM1Queue_a033_s075
2025-09-12 04:50:22,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 15 minutes, 43 seconds)
2025-09-12 05:02:08,858 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:02:08,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:06:33,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3589.19849 ± 1148.507
2025-09-12 05:06:33,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3327.5652, 4173.0083, 967.9623, 1837.29, 4326.9697, 4146.652, 4216.699, 4309.246, 4090.13, 4496.4604]
2025-09-12 05:06:33,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:06:33,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 6 minutes, 34 seconds)
2025-09-12 05:18:20,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:18:20,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:22:44,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3200.98242 ± 1352.154
2025-09-12 05:22:44,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4484.4897, 1678.9507, 3505.8572, 4277.6343, 4054.0632, 1961.8939, 4269.737, 3930.0854, 3636.238, 210.87524]
2025-09-12 05:22:44,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:22:44,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 36 minutes, 34 seconds)
2025-09-12 05:34:13,194 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:34:13,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:39:42,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3729.11914 ± 1031.710
2025-09-12 05:39:42,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4469.543, 4284.399, 3751.0967, 4089.8315, 1403.0693, 4233.107, 2036.7693, 4375.9966, 4284.5566, 4362.8228]
2025-09-12 05:39:42,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:39:42,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 34 minutes, 47 seconds)
2025-09-12 05:51:29,129 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:51:29,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:55:49,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3779.76807 ± 859.813
2025-09-12 05:55:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1424.8639, 4148.5127, 4512.5894, 4104.799, 4233.9136, 4446.3438, 4050.2239, 4040.4946, 3504.7847, 3331.1526]
2025-09-12 05:55:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:55:49,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 22 minutes, 38 seconds)
2025-09-12 06:07:20,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:07:20,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:11:44,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4076.77734 ± 455.828
2025-09-12 06:11:44,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4209.691, 4530.2783, 4099.124, 4330.8174, 2858.5625, 4491.3984, 4292.7246, 4187.091, 3870.3015, 3897.7869]
2025-09-12 06:11:44,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:11:44,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 4 minutes, 17 seconds)
2025-09-12 06:23:21,300 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:23:21,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:27:46,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3763.13818 ± 1100.440
2025-09-12 06:27:46,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4003.2888, 538.80524, 3941.6245, 4245.7974, 3689.5598, 4127.8735, 4604.4727, 4234.0747, 3944.2222, 4301.6626]
2025-09-12 06:27:46,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:27:46,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 45 minutes, 40 seconds)
2025-09-12 06:39:18,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:39:18,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:43:41,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4109.65137 ± 237.659
2025-09-12 06:43:41,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4165.602, 3956.533, 4265.891, 3582.385, 4359.084, 4368.078, 4252.9253, 3837.9382, 4219.3247, 4088.7493]
2025-09-12 06:43:41,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:43:41,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 25 minutes, 48 seconds)
2025-09-12 06:55:29,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:55:29,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:00:57,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3961.88818 ± 997.374
2025-09-12 07:00:57,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4219.6357, 3829.777, 1048.3767, 4260.3623, 4361.505, 4465.751, 4098.469, 4358.9, 4221.149, 4754.957]
2025-09-12 07:00:57,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:00:57,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 13 minutes, 42 seconds)
2025-09-12 07:12:35,023 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:12:35,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:17:00,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4066.32666 ± 202.055
2025-09-12 07:17:00,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4090.851, 4426.381, 4000.8352, 3752.8105, 3920.8604, 3912.584, 3922.084, 4161.5835, 4092.6045, 4382.6724]
2025-09-12 07:17:00,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:17:00,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 56 minutes, 24 seconds)
2025-09-12 07:28:35,737 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:28:35,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:32:58,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3680.83252 ± 1072.905
2025-09-12 07:32:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [685.5576, 4278.354, 3049.5708, 3815.2695, 3963.436, 4518.82, 4495.707, 3906.5745, 4048.6643, 4046.3716]
2025-09-12 07:32:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:32:58,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 41 minutes, 4 seconds)
2025-09-12 07:44:22,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:44:22,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:48:47,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3886.74487 ± 914.655
2025-09-12 07:48:47,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4347.217, 4343.0093, 1184.4985, 4263.5986, 4202.0195, 4114.857, 3988.5317, 3855.419, 4177.031, 4391.264]
2025-09-12 07:48:47,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:48:47,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 21 minutes, 44 seconds)
2025-09-12 08:00:38,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:00:38,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:05:00,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3937.20508 ± 1027.126
2025-09-12 08:05:00,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4253.6, 4026.2993, 4364.09, 887.2591, 4311.3647, 4533.7695, 4235.068, 4377.3296, 4349.181, 4034.0903]
2025-09-12 08:05:00,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:05:00,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 9 minutes, 26 seconds)
2025-09-12 08:16:32,194 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:16:32,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:20:52,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3871.19922 ± 1135.596
2025-09-12 08:20:52,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4638.5557, 3147.5247, 4571.957, 698.8618, 4562.453, 4148.954, 4383.697, 4081.9917, 4457.8965, 4020.1035]
2025-09-12 08:20:52,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:20:52,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 34 minutes, 46 seconds)
2025-09-12 08:32:09,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:32:09,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:36:26,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3574.66553 ± 1469.931
2025-09-12 08:36:26,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4295.7075, -8.691079, 4041.759, 4396.561, 4528.311, 1468.3579, 4228.495, 3951.1938, 4429.7925, 4415.167]
2025-09-12 08:36:26,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:36:26,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 12 minutes, 40 seconds)
2025-09-12 08:47:27,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:47:27,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:51:41,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3679.76318 ± 1291.951
2025-09-12 08:51:41,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4108.0786, 4446.014, 3996.3918, 1832.9224, 579.58234, 4398.174, 4573.221, 4043.4663, 4787.8745, 4031.9055]
2025-09-12 08:51:41,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:51:41,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 47 minutes, 31 seconds)
2025-09-12 09:02:57,899 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:02:57,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:07:14,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4028.41357 ± 864.599
2025-09-12 09:07:14,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4719.3833, 4371.1577, 4111.4375, 4091.3577, 4521.246, 4398.319, 4408.6777, 3400.6055, 1657.8323, 4604.121]
2025-09-12 09:07:14,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:07:14,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 28 minutes, 34 seconds)
2025-09-12 09:18:32,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:18:32,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:22:46,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4080.99146 ± 629.960
2025-09-12 09:22:46,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4734.235, 3993.3203, 3996.1099, 4573.547, 4484.519, 4255.3574, 4274.6567, 4019.093, 4143.309, 2335.7678]
2025-09-12 09:22:46,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:22:46,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 4 minutes, 24 seconds)
2025-09-12 09:34:03,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:34:03,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:39:20,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4168.86670 ± 247.244
2025-09-12 09:39:20,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4505.638, 3658.4626, 4158.5063, 4425.117, 4413.051, 4192.6504, 3995.2112, 4322.3857, 4065.64, 3952.0095]
2025-09-12 09:39:20,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:39:20,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4168.87) for latency MM1Queue_a033_s075
2025-09-12 09:39:20,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 57 minutes, 22 seconds)
2025-09-12 09:50:51,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:50:51,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:55:05,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4287.85693 ± 149.482
2025-09-12 09:55:05,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4420.434, 4041.7178, 4411.326, 4377.3135, 4240.6963, 4495.587, 4387.66, 4293.0117, 4112.645, 4098.1743]
2025-09-12 09:55:05,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:55:05,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4287.86) for latency MM1Queue_a033_s075
2025-09-12 09:55:05,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 43 minutes, 47 seconds)
2025-09-12 10:06:12,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:06:12,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:10:25,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4357.72998 ± 215.515
2025-09-12 10:10:25,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3955.8943, 4440.0576, 4354.3423, 4376.3413, 3999.0288, 4410.996, 4561.3374, 4521.4263, 4661.585, 4296.289]
2025-09-12 10:10:25,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:10:25,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4357.73) for latency MM1Queue_a033_s075
2025-09-12 10:10:25,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 29 minutes, 4 seconds)
2025-09-12 10:21:30,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:21:30,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:25:45,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4260.76074 ± 275.860
2025-09-12 10:25:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4534.241, 3999.609, 4332.1953, 3769.668, 4340.2827, 3930.7493, 4149.0166, 4545.1167, 4654.076, 4352.655]
2025-09-12 10:25:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:25:45,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 10 minutes, 41 seconds)
2025-09-12 10:36:52,507 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:36:52,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:41:05,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4039.03589 ± 844.974
2025-09-12 10:41:05,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4161.2393, 4126.8584, 4543.37, 4428.35, 4536.6772, 4040.3308, 4691.286, 4289.2383, 1595.3093, 3977.7]
2025-09-12 10:41:05,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:41:05,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 52 minutes, 42 seconds)
2025-09-12 10:52:21,406 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:52:21,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:57:39,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3915.55933 ± 919.033
2025-09-12 10:57:39,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4537.3833, 4695.458, 3304.6995, 3707.4033, 4561.7666, 4026.2603, 4115.3013, 1441.3046, 4438.4136, 4327.6035]
2025-09-12 10:57:39,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:57:39,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 37 minutes, 9 seconds)
2025-09-12 11:08:44,645 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:08:44,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:14:04,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3865.48706 ± 711.219
2025-09-12 11:14:04,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4228.651, 4006.2202, 3177.2864, 3936.7341, 4519.362, 4471.43, 3596.771, 4121.2065, 4492.945, 2104.2646]
2025-09-12 11:14:04,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:14:04,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 28 minutes, 52 seconds)
2025-09-12 11:25:21,675 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:25:21,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:29:39,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3703.77881 ± 1409.196
2025-09-12 11:29:39,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4316.7246, 71.8769, 4349.0747, 4600.131, 2017.3114, 4139.239, 4123.5435, 4406.414, 4308.1523, 4705.3213]
2025-09-12 11:29:39,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:29:39,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 15 minutes, 38 seconds)
2025-09-12 11:41:02,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:41:02,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:45:16,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4391.49121 ± 305.976
2025-09-12 11:45:16,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4844.181, 4504.9688, 4976.5854, 4107.2764, 4280.9497, 4363.1177, 3893.9536, 4218.6445, 4322.714, 4402.5146]
2025-09-12 11:45:16,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:45:16,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4391.49) for latency MM1Queue_a033_s075
2025-09-12 11:45:16,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 2 minutes, 51 seconds)
2025-09-12 11:56:24,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:56:24,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:00:38,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4492.99072 ± 262.957
2025-09-12 12:00:38,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4631.891, 4834.661, 4657.447, 4569.384, 4707.4785, 4572.846, 4093.8767, 3954.5242, 4548.4287, 4359.3667]
2025-09-12 12:00:38,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:00:38,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4492.99) for latency MM1Queue_a033_s075
2025-09-12 12:00:38,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 47 minutes, 23 seconds)
2025-09-12 12:11:43,082 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:11:43,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:16:00,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4436.97119 ± 236.776
2025-09-12 12:16:00,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4700.5786, 4497.222, 3952.3142, 4600.4546, 4735.0464, 4297.2207, 4115.47, 4483.9526, 4441.2886, 4546.161]
2025-09-12 12:16:00,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:16:00,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 19 minutes, 10 seconds)
2025-09-12 12:27:16,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:27:16,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:31:33,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3798.15576 ± 1232.392
2025-09-12 12:31:33,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4483.6353, 3832.2827, 2374.0537, 4481.6978, 4253.953, 4480.838, 654.56433, 4244.9116, 4918.6416, 4256.9795]
2025-09-12 12:31:33,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:31:33,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 54 minutes, 45 seconds)
2025-09-12 12:42:43,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:42:43,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:46:57,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4500.09082 ± 219.622
2025-09-12 12:46:57,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4614.833, 4526.0894, 4766.3184, 4386.2046, 4467.7856, 4253.3877, 4819.3047, 4039.5269, 4581.4233, 4546.0312]
2025-09-12 12:46:57,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:46:57,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4500.09) for latency MM1Queue_a033_s075
2025-09-12 12:46:57,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 37 minutes, 37 seconds)
2025-09-12 12:58:03,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:58:03,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:02:20,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3767.83521 ± 1312.367
2025-09-12 13:02:20,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4300.6763, 2335.1663, 3836.578, 4336.521, 4448.9414, 4472.863, 4625.8115, 4451.942, 4537.3164, 332.53632]
2025-09-12 13:02:20,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:02:20,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 19 minutes, 55 seconds)
2025-09-12 13:13:36,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:13:36,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:17:52,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4094.07739 ± 878.871
2025-09-12 13:17:52,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4458.1816, 4173.6025, 4785.4976, 4506.7104, 4175.377, 4281.6953, 3973.66, 4309.6514, 1555.9973, 4720.402]
2025-09-12 13:17:52,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:17:52,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 5 minutes, 56 seconds)
2025-09-12 13:28:59,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:28:59,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:33:14,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4310.46094 ± 448.737
2025-09-12 13:33:14,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4375.3813, 4187.642, 3345.0435, 4719.526, 4446.716, 4801.7183, 4614.5923, 4312.6304, 3654.936, 4646.424]
2025-09-12 13:33:14,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:33:14,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 50 minutes, 36 seconds)
2025-09-12 13:44:25,628 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:44:25,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:48:45,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4281.61230 ± 1128.341
2025-09-12 13:48:45,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4662.4897, 4605.0386, 4439.7617, 5122.781, 4289.095, 5022.45, 4447.7505, 4676.285, 4575.053, 975.4187]
2025-09-12 13:48:45,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:48:45,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 34 minutes, 49 seconds)
2025-09-12 13:59:58,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:59:58,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:05:17,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4315.41357 ± 899.502
2025-09-12 14:05:17,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1700.0537, 4901.592, 4718.2686, 4631.335, 4473.625, 4272.914, 4807.4272, 4175.895, 4703.3257, 4769.703]
2025-09-12 14:05:17,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:05:17,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 29 minutes, 17 seconds)
2025-09-12 14:16:44,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:16:44,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:21:01,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4088.46436 ± 957.396
2025-09-12 14:21:01,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4496.312, 1922.253, 4519.9414, 4718.0938, 4385.92, 4569.6655, 4453.597, 2524.1099, 4963.079, 4331.672]
2025-09-12 14:21:01,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:21:01,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 16 minutes, 43 seconds)
2025-09-12 14:32:13,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:32:13,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:36:30,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4270.60840 ± 688.264
2025-09-12 14:36:30,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4325.1924, 4485.2085, 4265.569, 4630.8716, 2317.0793, 4519.1606, 4639.2163, 5005.925, 4218.069, 4299.795]
2025-09-12 14:36:30,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:36:30,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 30 seconds)
2025-09-12 14:47:41,692 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:47:41,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:53:00,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4075.84814 ± 1180.146
2025-09-12 14:53:00,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4362.7646, 4465.84, 4614.04, 4687.923, 4422.882, 4057.573, 4660.2524, 4640.772, 4265.082, 581.35236]
2025-09-12 14:53:00,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:53:00,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 54 minutes, 4 seconds)
2025-09-12 15:04:20,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:04:20,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:08:38,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4045.04688 ± 976.401
2025-09-12 15:08:38,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1156.2303, 4194.1704, 4431.6445, 4289.684, 4278.2476, 4226.819, 4135.6553, 4634.9727, 4558.973, 4544.0684]
2025-09-12 15:08:38,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:08:38,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 39 minutes, 2 seconds)
2025-09-12 15:19:55,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:19:55,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:24:10,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4413.58545 ± 270.589
2025-09-12 15:24:10,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4463.8413, 4024.9927, 4217.771, 4309.2, 4248.2393, 4400.005, 5030.056, 4415.38, 4739.028, 4287.3423]
2025-09-12 15:24:10,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:24:10,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 15 minutes, 18 seconds)
2025-09-12 15:35:34,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:35:34,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:39:48,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4185.20459 ± 1002.831
2025-09-12 15:39:48,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4525.463, 1223.0398, 4254.697, 4874.6743, 4605.0605, 4306.9136, 4545.3267, 4434.8525, 4699.1196, 4382.9004]
2025-09-12 15:39:48,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:39:48,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 58 minutes, 43 seconds)
2025-09-12 15:50:56,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:50:56,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:55:13,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3881.64966 ± 1311.362
2025-09-12 15:55:13,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4173.614, 760.1864, 4281.2773, 4613.066, 1921.3593, 4553.077, 4870.9106, 4584.295, 4344.53, 4714.178]
2025-09-12 15:55:13,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:55:13,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 42 minutes, 34 seconds)
2025-09-12 16:06:18,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:06:18,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:10:32,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4517.38965 ± 316.857
2025-09-12 16:10:32,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4577.0684, 4413.3975, 4278.3, 4548.5234, 4232.21, 4797.12, 4021.5168, 4794.421, 4345.972, 5165.369]
2025-09-12 16:10:32,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:10:32,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4517.39) for latency MM1Queue_a033_s075
2025-09-12 16:10:32,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 18 minutes, 13 seconds)
2025-09-12 16:21:49,767 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:21:49,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:26:07,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4512.37207 ± 723.979
2025-09-12 16:26:07,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4029.4731, 4916.6196, 5129.607, 4403.472, 5112.0933, 4493.6724, 2586.773, 4761.048, 5069.4185, 4621.547]
2025-09-12 16:26:07,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:26:07,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 2 minutes, 26 seconds)
2025-09-12 16:37:13,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:37:13,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:41:28,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4098.98535 ± 1138.559
2025-09-12 16:41:28,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4781.9077, 4530.464, 4437.202, 4564.112, 2274.2222, 4852.211, 4645.9214, 4963.3457, 4468.2905, 1472.1782]
2025-09-12 16:41:28,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:41:28,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 45 minutes, 38 seconds)
2025-09-12 16:52:55,226 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:52:55,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:57:08,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3379.42236 ± 1837.110
2025-09-12 16:57:08,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4217.294, 14.758571, 4303.0054, 5060.986, 675.8837, 4504.208, 1222.805, 4992.699, 4146.616, 4655.9663]
2025-09-12 16:57:08,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:57:08,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 30 minutes, 24 seconds)
2025-09-12 17:08:41,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:08:41,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:12:59,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4460.36816 ± 902.089
2025-09-12 17:12:59,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4824.9185, 5143.8037, 4576.3105, 4653.304, 4200.326, 1856.7208, 4720.5273, 4863.0864, 5047.2866, 4717.394]
2025-09-12 17:12:59,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:12:59,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 17 minutes, 41 seconds)
2025-09-12 17:24:17,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:24:17,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:28:33,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4617.65479 ± 561.315
2025-09-12 17:28:33,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4415.0317, 4995.764, 4143.5034, 5338.4644, 4931.272, 3299.7988, 5108.404, 4466.0894, 4955.8247, 4522.394]
2025-09-12 17:28:33,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:28:33,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4617.65) for latency MM1Queue_a033_s075
2025-09-12 17:28:33,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 3 minutes, 38 seconds)
2025-09-12 17:39:41,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:39:41,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:43:57,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3848.24609 ± 1824.098
2025-09-12 17:43:57,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4654.342, 4418.8438, 4986.6724, 4808.062, 4926.7188, 401.9831, 4659.43, 4865.197, 4730.1533, 31.060265]
2025-09-12 17:43:57,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:43:57,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 47 minutes)
2025-09-12 17:55:20,664 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:55:20,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:59:36,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4156.44824 ± 1244.177
2025-09-12 17:59:36,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3511.8176, 4825.216, 3094.2134, 4058.5896, 4714.771, 5231.322, 4723.874, 1057.1367, 5258.574, 5088.9653]
2025-09-12 17:59:36,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:59:36,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 33 minutes, 10 seconds)
2025-09-12 18:10:43,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:10:43,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:15:01,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3707.96362 ± 1729.577
2025-09-12 18:15:01,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4624.192, 4873.8037, 4358.2563, 794.0319, 2045.9575, 4848.11, 5236.08, 4898.9287, 584.1683, 4816.1104]
2025-09-12 18:15:01,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:15:01,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 16 minutes, 8 seconds)
2025-09-12 18:26:20,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:26:20,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:31:38,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3622.94067 ± 1677.207
2025-09-12 18:31:38,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4524.576, 4379.9863, 572.3869, 4492.4976, 4512.284, 5163.225, 4745.1978, 419.34882, 2714.6765, 4705.225]
2025-09-12 18:31:38,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:31:38,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 4 minutes, 43 seconds)
2025-09-12 18:42:57,692 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:42:57,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:47:13,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4257.28418 ± 1120.219
2025-09-12 18:47:13,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4864.85, 4394.227, 4824.7554, 5136.0425, 4787.935, 4807.903, 1615.7019, 4827.054, 2548.3184, 4766.054]
2025-09-12 18:47:13,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:47:13,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 49 minutes, 5 seconds)
2025-09-12 18:58:27,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:58:27,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:02:40,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3844.58447 ± 1700.778
2025-09-12 19:02:40,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5072.0786, 5175.3896, 2382.0798, 1706.2018, 4393.6865, 4637.1353, 109.08701, 4731.3867, 5210.9893, 5027.8066]
2025-09-12 19:02:40,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:02:40,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 33 minutes, 36 seconds)
2025-09-12 19:13:47,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:13:47,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:18:04,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4139.05225 ± 1566.666
2025-09-12 19:18:04,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1172.8242, 4697.852, 4937.0137, 4944.9297, 5039.4697, 4943.0513, 865.52594, 5147.9697, 4935.989, 4705.8965]
2025-09-12 19:18:04,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:18:04,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 16 minutes, 38 seconds)
2025-09-12 19:29:10,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:29:10,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:33:25,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4635.16016 ± 964.124
2025-09-12 19:33:25,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1870.2954, 4946.172, 4666.3403, 4493.831, 5407.945, 4732.609, 4742.4907, 4939.4326, 5261.4834, 5290.9985]
2025-09-12 19:33:25,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:33:25,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4635.16) for latency MM1Queue_a033_s075
2025-09-12 19:33:25,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 35 seconds)
2025-09-12 19:44:44,695 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:44:44,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:49:01,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4750.71533 ± 425.535
2025-09-12 19:49:01,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5312.5723, 4616.5986, 4757.398, 3738.355, 5192.996, 4735.971, 4754.7983, 4393.371, 4972.2563, 5032.8325]
2025-09-12 19:49:01,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:49:01,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (4750.72) for latency MM1Queue_a033_s075
2025-09-12 19:49:01,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 40 minutes, 28 seconds)
2025-09-12 20:00:19,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:00:19,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:04:36,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4653.85107 ± 1502.780
2025-09-12 20:04:36,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4753.4043, 4575.205, 4722.2393, 5646.354, 5297.5225, 5461.572, 5286.3447, 5280.299, 254.95663, 5260.6147]
2025-09-12 20:04:36,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:04:36,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 24 minutes, 59 seconds)
2025-09-12 20:16:02,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:16:02,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:20:16,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5064.65430 ± 259.456
2025-09-12 20:20:16,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4896.761, 4783.525, 5514.484, 5102.7695, 5352.1934, 4777.1685, 4958.775, 5233.417, 4738.0938, 5289.356]
2025-09-12 20:20:16,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:20:16,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1226 [INFO]: New best (5064.65) for latency MM1Queue_a033_s075
2025-09-12 20:20:16,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 10 minutes, 22 seconds)
2025-09-12 20:31:36,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:31:36,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:35:53,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4475.08398 ± 511.293
2025-09-12 20:35:53,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4514.4194, 5001.153, 4885.093, 5079.276, 4300.6045, 3773.8076, 3936.169, 4800.6875, 4847.7803, 3611.8477]
2025-09-12 20:35:53,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:35:53,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 55 minutes, 41 seconds)
2025-09-12 20:47:17,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:47:17,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:51:32,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4998.49707 ± 219.983
2025-09-12 20:51:32,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5233.1743, 4629.876, 5142.5728, 5187.665, 4794.3765, 4976.1924, 4793.812, 5304.8057, 5128.234, 4794.2563]
2025-09-12 20:51:32,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:51:32,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 41 minutes, 14 seconds)
2025-09-12 21:02:45,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:02:45,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:07:00,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4343.62012 ± 1315.475
2025-09-12 21:07:00,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4813.832, 1150.3413, 4684.656, 4852.86, 5292.9927, 5433.6265, 4883.404, 5014.8794, 2486.3262, 4823.277]
2025-09-12 21:07:00,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:07:00,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 25 minutes, 7 seconds)
2025-09-12 21:18:09,132 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:18:09,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:23:27,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4433.53369 ± 1183.381
2025-09-12 21:23:27,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4981.6123, 4484.1133, 5178.602, 4946.0703, 5129.51, 5000.894, 4090.2896, 4299.9062, 5167.546, 1056.7894]
2025-09-12 21:23:27,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:23:27,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 12 minutes, 19 seconds)
2025-09-12 21:34:40,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:34:40,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:38:57,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3638.42505 ± 2053.628
2025-09-12 21:38:57,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [185.19833, 4593.099, 5462.001, 1309.8518, 4810.3086, 4542.918, 178.91606, 5160.0566, 4997.504, 5144.395]
2025-09-12 21:38:57,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:38:57,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 56 minutes, 2 seconds)
2025-09-12 21:50:06,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:50:06,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:54:20,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4607.66162 ± 1030.287
2025-09-12 21:54:20,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5177.616, 5086.5464, 5126.529, 1616.7391, 4419.527, 4876.414, 4484.9126, 5129.98, 5132.7476, 5025.6055]
2025-09-12 21:54:20,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:54:20,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 39 minutes, 41 seconds)
2025-09-12 22:05:27,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:05:27,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:10:46,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4268.76367 ± 1014.796
2025-09-12 22:10:46,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5383.657, 4889.4927, 4413.6157, 2388.9424, 4355.5874, 4984.487, 2551.2927, 4975.6836, 5122.2314, 3622.6492]
2025-09-12 22:10:46,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:10:46,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 26 minutes, 1 second)
2025-09-12 22:22:06,070 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:22:06,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:26:23,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4745.49121 ± 530.508
2025-09-12 22:26:23,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4486.6475, 4795.063, 5066.229, 3419.4841, 4944.4834, 4659.8467, 4481.607, 5157.4175, 4954.4277, 5489.7046]
2025-09-12 22:26:23,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:26:23,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 10 minutes, 30 seconds)
2025-09-12 22:37:36,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:37:36,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:41:51,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4817.64893 ± 899.300
2025-09-12 22:41:51,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5033.0444, 5346.581, 5237.2827, 4706.418, 5126.6494, 5301.987, 5107.898, 2178.6172, 4885.1274, 5252.884]
2025-09-12 22:41:51,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:41:51,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 52 minutes, 29 seconds)
2025-09-12 22:52:58,398 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:52:58,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:57:13,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 3845.33789 ± 1598.681
2025-09-12 22:57:13,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5118.1475, 4702.543, 5243.145, 2924.7366, 4885.5796, 5343.521, 1652.9954, 878.2728, 2535.9749, 5168.4604]
2025-09-12 22:57:13,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:57:13,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 36 minutes, 32 seconds)
2025-09-12 23:08:31,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:08:31,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:12:45,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4385.90479 ± 1489.630
2025-09-12 23:12:45,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-40.80112, 5130.425, 4922.1953, 5001.441, 4924.9917, 5006.104, 4915.862, 4713.3784, 4344.2607, 4941.1924]
2025-09-12 23:12:45,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:12:45,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 21 minutes, 8 seconds)
2025-09-12 23:24:00,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:24:00,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:29:18,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4754.04395 ± 1574.968
2025-09-12 23:29:18,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5122.755, 5058.0376, 5587.1445, 5331.8057, 4618.0747, 5520.0293, 102.58257, 5236.3774, 5570.5776, 5393.0537]
2025-09-12 23:29:18,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:29:18,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 5 minutes, 39 seconds)
2025-09-12 23:40:43,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:40:43,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:45:00,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4231.89990 ± 990.060
2025-09-12 23:45:00,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3532.1711, 5385.791, 4688.847, 4661.2188, 5325.825, 1832.5499, 3631.239, 4147.1724, 4391.315, 4722.8706]
2025-09-12 23:45:00,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:45:00,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 50 minutes, 4 seconds)
2025-09-12 23:56:08,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:56:08,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:00:20,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4849.41504 ± 647.168
2025-09-13 00:00:20,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5065.5957, 5084.991, 4891.8916, 5122.334, 4615.244, 3025.8503, 4788.3555, 5331.2896, 5196.7334, 5371.868]
2025-09-13 00:00:20,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:00:20,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 34 minutes, 10 seconds)
2025-09-13 00:11:18,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:11:18,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:16:35,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 5064.41699 ± 238.591
2025-09-13 00:16:35,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5333.3154, 4819.829, 5278.6978, 4966.941, 5086.1514, 4495.7544, 5230.235, 5211.5054, 5063.892, 5157.8535]
2025-09-13 00:16:35,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:16:35,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 19 minutes, 21 seconds)
2025-09-13 00:27:32,144 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:27:32,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:32:48,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4867.09863 ± 1336.185
2025-09-13 00:32:48,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5602.6724, 935.1719, 5414.1157, 5270.6406, 5524.77, 5457.599, 5486.8906, 5271.6416, 5028.858, 4678.623]
2025-09-13 00:32:48,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:32:48,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 4 minutes, 2 seconds)
2025-09-13 00:44:14,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:44:14,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:48:28,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4562.10986 ± 934.500
2025-09-13 00:48:28,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [6027.4043, 4903.312, 5303.5845, 4666.244, 4898.6943, 4579.978, 5241.9185, 3243.9292, 2749.0242, 4007.0112]
2025-09-13 00:48:28,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:48:28,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 47 minutes, 29 seconds)
2025-09-13 00:59:28,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:59:28,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:03:42,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4806.75879 ± 1235.449
2025-09-13 01:03:42,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5454.9253, 5830.6387, 5114.579, 4963.117, 4915.798, 1184.7076, 5212.856, 4923.2197, 5209.2886, 5258.4575]
2025-09-13 01:03:42,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:03:42,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 28 seconds)
2025-09-13 01:14:33,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:14:33,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:18:43,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4648.52881 ± 1029.212
2025-09-13 01:18:43,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4992.5864, 5473.859, 4967.567, 4947.5654, 2109.4724, 3458.976, 5802.433, 4776.687, 4682.218, 5273.922]
2025-09-13 01:18:43,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:18:43,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 40 seconds)
2025-09-13 01:29:51,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:29:51,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:34:02,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 4923.37646 ± 545.437
2025-09-13 01:34:02,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [5304.535, 4358.8076, 5168.003, 3533.1453, 5265.5317, 5273.7163, 5245.412, 5235.6606, 4727.7915, 5121.159]
2025-09-13 01:34:02,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:34:02,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-halfcheetah):1251 [DEBUG]: Training session finished
