2025-09-11 23:54:01,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:54:01,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-09-11 23:54:01,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x153ab48f2150>}
2025-09-11 23:54:01,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 23:54:01,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 23:54:01,055 baseline-mbpac-noiseperc20-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 23:54:01,055 baseline-mbpac-noiseperc20-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 23:54:01,063 baseline-mbpac-noiseperc20-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 23:54:02,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 23:54:02,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-12 00:04:56,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:04:56,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:09:48,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -289.12186 ± 20.399
2025-09-12 00:09:48,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-330.14743, -306.1545, -252.63249, -271.93484, -283.79803, -296.24896, -304.68173, -274.51318, -284.85605, -286.25125]
2025-09-12 00:09:48,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:09:48,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-289.12) for latency MM1Queue_a033_s075
2025-09-12 00:09:48,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 26 hours, 1 minute, 23 seconds)
2025-09-12 00:21:58,680 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:21:58,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:26:49,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -249.94612 ± 29.061
2025-09-12 00:26:49,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-272.0848, -205.11868, -266.2356, -212.76213, -270.78564, -241.47964, -279.49405, -294.59073, -227.43465, -229.47542]
2025-09-12 00:26:49,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:26:49,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-249.95) for latency MM1Queue_a033_s075
2025-09-12 00:26:49,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 26 hours, 46 minutes, 47 seconds)
2025-09-12 00:39:00,002 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:39:00,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 00:43:50,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -39.38508 ± 28.450
2025-09-12 00:43:50,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-95.81384, -19.788654, -38.69353, -13.238929, -59.353745, -53.063316, -4.1915517, -8.387574, -30.630703, -70.688965]
2025-09-12 00:43:50,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:43:50,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-39.39) for latency MM1Queue_a033_s075
2025-09-12 00:43:51,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 26 hours, 50 minutes, 40 seconds)
2025-09-12 00:56:02,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:56:02,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:00:54,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -0.16134 ± 42.486
2025-09-12 01:00:54,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [18.408552, 47.99347, 16.011803, 13.921612, -76.91362, -83.411095, 23.860743, 2.8846188, 38.707947, -3.0774794]
2025-09-12 01:00:54,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:00:54,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-0.16) for latency MM1Queue_a033_s075
2025-09-12 01:00:54,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 26 hours, 44 minutes, 58 seconds)
2025-09-12 01:13:05,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:13:05,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:17:54,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 213.96036 ± 69.693
2025-09-12 01:17:54,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [200.23962, 238.01297, 178.0404, 142.62209, 138.33742, 251.18974, 215.48933, 174.08801, 394.7296, 206.85446]
2025-09-12 01:17:54,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:17:54,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (213.96) for latency MM1Queue_a033_s075
2025-09-12 01:17:54,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 26 hours, 33 minutes, 31 seconds)
2025-09-12 01:30:05,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:30:05,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:34:55,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 487.95117 ± 76.101
2025-09-12 01:34:55,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [441.85574, 484.71643, 472.2231, 442.59012, 546.4592, 493.19186, 325.6716, 609.8924, 581.6174, 481.29413]
2025-09-12 01:34:55,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:34:55,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (487.95) for latency MM1Queue_a033_s075
2025-09-12 01:34:55,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 26 hours, 40 minutes, 16 seconds)
2025-09-12 01:47:09,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:47:09,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 01:51:59,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 522.03015 ± 112.006
2025-09-12 01:51:59,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [361.67136, 436.41077, 642.5221, 463.79953, 498.3643, 505.45166, 719.05286, 546.288, 392.50247, 654.23865]
2025-09-12 01:51:59,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:51:59,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (522.03) for latency MM1Queue_a033_s075
2025-09-12 01:51:59,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 26 hours, 23 minutes, 57 seconds)
2025-09-12 02:04:11,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:04:11,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:09:01,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 589.38800 ± 229.647
2025-09-12 02:09:01,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [823.5001, 1000.4206, 531.1754, 872.8379, 679.06335, 416.42105, 309.934, 378.14798, 353.757, 528.6225]
2025-09-12 02:09:01,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:09:01,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (589.39) for latency MM1Queue_a033_s075
2025-09-12 02:09:01,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 26 hours, 7 minutes, 6 seconds)
2025-09-12 02:21:13,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:21:13,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:26:04,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 644.73926 ± 246.757
2025-09-12 02:26:04,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [387.6471, 593.11505, 1005.5627, 844.28937, 425.36462, 511.6047, 971.37054, 499.40573, 898.8129, 310.21976]
2025-09-12 02:26:04,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:26:04,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (644.74) for latency MM1Queue_a033_s075
2025-09-12 02:26:04,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 25 hours, 50 minutes, 4 seconds)
2025-09-12 02:38:21,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:38:21,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 02:43:12,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 629.60101 ± 261.331
2025-09-12 02:43:12,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [504.86942, 963.912, 483.3962, 1059.135, 571.12213, 513.7293, 299.6279, 499.7193, 386.27478, 1014.2242]
2025-09-12 02:43:12,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:43:12,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 25 hours, 35 minutes, 20 seconds)
2025-09-12 02:55:25,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:55:25,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:00:17,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 638.43121 ± 237.520
2025-09-12 03:00:17,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1043.9221, 825.2255, 320.7324, 701.4331, 616.3082, 409.4421, 659.8693, 394.63376, 443.8032, 968.9423]
2025-09-12 03:00:17,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:00:17,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 25 hours, 19 minutes, 28 seconds)
2025-09-12 03:12:28,341 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:12:28,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:17:16,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1173.42664 ± 209.254
2025-09-12 03:17:16,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1314.8699, 1148.8099, 1336.0437, 1182.6927, 620.26025, 1401.6539, 1326.6779, 1169.7181, 1153.7258, 1079.8143]
2025-09-12 03:17:16,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:17:16,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1173.43) for latency MM1Queue_a033_s075
2025-09-12 03:17:16,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 25 hours, 1 minute, 12 seconds)
2025-09-12 03:29:19,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:29:19,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:34:06,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 939.40588 ± 346.693
2025-09-12 03:34:06,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [515.1992, 507.0739, 1038.4407, 661.6813, 1361.6548, 435.80453, 1177.8206, 1243.8495, 1259.0309, 1193.5026]
2025-09-12 03:34:06,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:34:06,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 24 hours, 40 minutes, 37 seconds)
2025-09-12 03:46:08,884 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:08,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:50:58,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1066.09924 ± 408.525
2025-09-12 03:50:58,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [841.9649, 1268.025, 1365.3511, 1430.5967, 1379.4326, 395.62692, 508.1516, 596.80566, 1424.2863, 1450.7524]
2025-09-12 03:50:58,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:50:58,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 24 hours, 20 minutes, 17 seconds)
2025-09-12 04:03:01,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:03:01,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:07:51,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1357.25049 ± 263.591
2025-09-12 04:07:51,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1586.8888, 1398.248, 1604.7548, 1454.1603, 1530.2627, 1518.4503, 1000.63153, 1408.3541, 1332.7198, 738.0356]
2025-09-12 04:07:51,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:07:51,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1357.25) for latency MM1Queue_a033_s075
2025-09-12 04:07:51,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 23 hours, 59 minutes, 14 seconds)
2025-09-12 04:19:55,555 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:19:55,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:24:46,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1414.45239 ± 148.901
2025-09-12 04:24:46,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1489.0216, 1277.21, 1445.4681, 1349.5333, 1328.2661, 1515.6532, 1084.9469, 1521.6727, 1502.1892, 1630.5638]
2025-09-12 04:24:46,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:24:46,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1414.45) for latency MM1Queue_a033_s075
2025-09-12 04:24:46,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 23 hours, 39 minutes, 12 seconds)
2025-09-12 04:36:49,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:36:49,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:41:38,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1435.96875 ± 247.192
2025-09-12 04:41:38,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1589.2615, 1472.5072, 1688.444, 1555.4879, 744.88684, 1433.0688, 1480.8354, 1499.6738, 1337.7083, 1557.8145]
2025-09-12 04:41:38,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:41:38,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1435.97) for latency MM1Queue_a033_s075
2025-09-12 04:41:38,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 23 hours, 20 minutes, 23 seconds)
2025-09-12 04:53:41,465 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:53:41,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:58:32,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1210.66577 ± 368.831
2025-09-12 04:58:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [473.99496, 1472.0369, 1462.3607, 1539.5717, 1400.7456, 1611.2312, 1298.3835, 1252.2467, 923.16345, 672.9215]
2025-09-12 04:58:32,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:58:32,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 23 hours, 4 minutes, 39 seconds)
2025-09-12 05:10:36,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:10:36,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:15:23,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1560.93555 ± 89.750
2025-09-12 05:15:23,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1507.3763, 1536.5782, 1631.1536, 1767.0806, 1435.4823, 1601.5759, 1587.5903, 1550.8381, 1457.211, 1534.47]
2025-09-12 05:15:23,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:15:23,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1560.94) for latency MM1Queue_a033_s075
2025-09-12 05:15:23,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 22 hours, 47 minutes, 25 seconds)
2025-09-12 05:27:26,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:27:26,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:32:13,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1512.59204 ± 255.469
2025-09-12 05:32:13,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1638.1101, 1461.3328, 861.4605, 1369.5742, 1609.6714, 1625.6571, 1556.3888, 1871.931, 1690.203, 1441.5906]
2025-09-12 05:32:13,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:32:13,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 22 hours, 29 minutes, 40 seconds)
2025-09-12 05:44:16,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:44:16,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:49:05,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1614.56616 ± 225.663
2025-09-12 05:49:05,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1630.1122, 1643.4015, 1587.5673, 1792.1997, 1757.6108, 1699.1355, 1812.7268, 1598.2299, 976.14594, 1648.5319]
2025-09-12 05:49:05,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:49:05,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1614.57) for latency MM1Queue_a033_s075
2025-09-12 05:49:05,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 22 hours, 12 minutes, 25 seconds)
2025-09-12 06:01:08,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:01:08,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:05:55,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1502.86353 ± 434.841
2025-09-12 06:05:55,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1830.382, 1821.8707, 1684.9052, 1774.9409, 584.7947, 1016.88556, 1868.5018, 993.83093, 1687.3573, 1765.1658]
2025-09-12 06:05:55,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:05:55,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 54 minutes, 54 seconds)
2025-09-12 06:18:00,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:18:00,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:22:49,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1730.31775 ± 160.320
2025-09-12 06:22:49,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1682.7928, 1465.2428, 1872.9027, 1879.3324, 1556.5872, 1899.6271, 1886.4714, 1849.7207, 1524.0284, 1686.4728]
2025-09-12 06:22:49,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:22:49,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1730.32) for latency MM1Queue_a033_s075
2025-09-12 06:22:49,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 21 hours, 37 minutes, 58 seconds)
2025-09-12 06:34:53,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:34:53,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:39:43,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1619.28613 ± 234.592
2025-09-12 06:39:43,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1695.0938, 1721.3246, 1748.2994, 968.1556, 1583.2169, 1728.6893, 1533.2574, 1836.3053, 1599.8226, 1778.6976]
2025-09-12 06:39:43,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:39:43,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 21 hours, 22 minutes, 2 seconds)
2025-09-12 06:51:48,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:51:48,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:56:36,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1675.64490 ± 166.794
2025-09-12 06:56:36,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1281.8424, 1555.6041, 1895.4342, 1839.5916, 1594.7596, 1676.8569, 1751.397, 1811.8445, 1637.8663, 1711.2517]
2025-09-12 06:56:36,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:56:36,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 21 hours, 5 minutes, 55 seconds)
2025-09-12 07:08:40,471 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:08:40,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:13:28,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1617.16663 ± 336.194
2025-09-12 07:13:28,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1689.4427, 1866.1794, 1592.997, 1688.1959, 2018.8712, 1650.6643, 708.1382, 1724.5739, 1447.4117, 1785.1914]
2025-09-12 07:13:28,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:13:28,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 48 minutes, 42 seconds)
2025-09-12 07:25:32,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:25:32,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:30:21,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1792.38477 ± 105.495
2025-09-12 07:30:21,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1617.5815, 1906.1918, 1766.5226, 1639.8215, 1926.4752, 1763.431, 1856.2784, 1928.9929, 1752.747, 1765.806]
2025-09-12 07:30:21,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:30:21,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1792.38) for latency MM1Queue_a033_s075
2025-09-12 07:30:21,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 20 hours, 32 minutes, 34 seconds)
2025-09-12 07:42:24,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:42:24,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:47:13,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1758.61230 ± 231.563
2025-09-12 07:47:13,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2000.5348, 1144.0997, 1689.703, 1930.3121, 1988.166, 1750.6941, 1774.368, 1749.2279, 1856.5304, 1702.4868]
2025-09-12 07:47:13,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:47:13,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 20 hours, 15 minutes, 18 seconds)
2025-09-12 07:59:18,068 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:18,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:04:04,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1566.21753 ± 560.241
2025-09-12 08:04:04,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1945.5234, 351.19254, 2071.9653, 1771.5656, 1888.4384, 1935.69, 1767.3566, 1789.436, 1516.8826, 624.12463]
2025-09-12 08:04:04,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:04:04,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 57 minutes, 34 seconds)
2025-09-12 08:16:08,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:16:08,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:20:56,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1772.93286 ± 255.942
2025-09-12 08:20:56,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1619.8253, 2110.751, 1833.9807, 1766.4684, 1113.4622, 1867.8262, 1803.1578, 2008.4442, 1879.1022, 1726.3088]
2025-09-12 08:20:56,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:20:56,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 40 minutes, 34 seconds)
2025-09-12 08:33:01,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:33:01,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:37:51,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1809.14453 ± 67.979
2025-09-12 08:37:51,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1774.8646, 1804.026, 1842.9463, 1695.7349, 1782.9446, 1910.9164, 1822.1864, 1872.6411, 1705.3805, 1879.8038]
2025-09-12 08:37:51,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:37:51,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1809.14) for latency MM1Queue_a033_s075
2025-09-12 08:37:51,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 19 hours, 24 minutes, 28 seconds)
2025-09-12 08:49:54,214 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:49:54,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:54:42,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1822.59729 ± 230.665
2025-09-12 08:54:42,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1764.1147, 1891.4222, 1951.9882, 1956.739, 1847.5851, 1940.1249, 1184.6613, 1725.9589, 1919.1879, 2044.1906]
2025-09-12 08:54:42,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:54:42,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1822.60) for latency MM1Queue_a033_s075
2025-09-12 08:54:42,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 19 hours, 7 minutes, 17 seconds)
2025-09-12 09:06:48,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:06:48,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:11:36,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1897.43884 ± 332.718
2025-09-12 09:11:36,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2019.4471, 2037.0111, 2225.2273, 1884.091, 968.8829, 2110.1797, 1947.7361, 2009.4487, 2031.4725, 1740.8925]
2025-09-12 09:11:36,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:11:36,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1897.44) for latency MM1Queue_a033_s075
2025-09-12 09:11:36,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 50 minutes, 46 seconds)
2025-09-12 09:23:41,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:23:41,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:28:30,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1859.82202 ± 98.408
2025-09-12 09:28:30,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2044.1494, 1897.9282, 1869.0378, 1708.7, 1752.4913, 1772.1025, 1994.8617, 1874.6592, 1852.0253, 1832.2653]
2025-09-12 09:28:30,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:28:30,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 34 minutes, 37 seconds)
2025-09-12 09:40:35,194 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:40:35,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:45:27,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1862.41333 ± 276.738
2025-09-12 09:45:27,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2052.823, 1925.0272, 2179.3342, 1143.2152, 2077.0315, 1825.262, 1851.1433, 1758.4005, 2057.2798, 1754.6162]
2025-09-12 09:45:27,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:45:27,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 18 hours, 18 minutes, 43 seconds)
2025-09-12 09:57:33,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:57:33,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:02:24,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1997.89221 ± 118.164
2025-09-12 10:02:24,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2197.3862, 2130.9082, 1787.4971, 1906.8037, 2121.144, 1929.5492, 2011.5293, 1902.0776, 2009.0238, 1983.0024]
2025-09-12 10:02:24,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:02:24,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1997.89) for latency MM1Queue_a033_s075
2025-09-12 10:02:24,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 18 hours, 2 minutes, 18 seconds)
2025-09-12 10:14:31,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:14:31,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:19:20,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1795.84473 ± 326.071
2025-09-12 10:19:20,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [915.3584, 1866.1287, 2040.7571, 2080.1885, 1863.3956, 1802.8398, 2075.2593, 1797.6343, 1926.8276, 1590.0592]
2025-09-12 10:19:20,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:19:20,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 46 minutes, 21 seconds)
2025-09-12 10:31:25,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:31:25,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:36:16,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1874.71326 ± 178.811
2025-09-12 10:36:16,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1600.7938, 2103.3977, 1949.4343, 1665.1176, 1935.209, 2185.5178, 1979.5864, 1768.87, 1745.4097, 1813.796]
2025-09-12 10:36:16,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:36:16,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 29 minutes, 56 seconds)
2025-09-12 10:48:22,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:48:22,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:53:12,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1933.86401 ± 445.040
2025-09-12 10:53:12,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2129.2925, 2024.7775, 2049.3665, 2055.7146, 652.87573, 1899.1273, 1987.4154, 1951.1373, 2277.2927, 2311.6436]
2025-09-12 10:53:12,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:53:12,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 17 hours, 13 minutes, 19 seconds)
2025-09-12 11:05:18,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:05:18,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:10:08,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2032.63086 ± 132.106
2025-09-12 11:10:08,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1890.9733, 1799.1359, 2133.2373, 2115.8953, 2059.793, 2261.1187, 1982.2388, 1973.1957, 1948.5239, 2162.1982]
2025-09-12 11:10:08,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:10:08,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2032.63) for latency MM1Queue_a033_s075
2025-09-12 11:10:08,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 16 hours, 56 minutes, 11 seconds)
2025-09-12 11:22:13,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:22:13,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:27:02,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1827.88403 ± 452.901
2025-09-12 11:27:02,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1950.3512, 2010.5168, 1926.4341, 2099.9197, 1738.6158, 2115.6096, 2063.378, 1801.8252, 2055.8877, 516.3002]
2025-09-12 11:27:02,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:27:03,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 38 minutes, 46 seconds)
2025-09-12 11:39:08,063 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:39:08,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:43:57,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1963.66821 ± 147.892
2025-09-12 11:43:57,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1747.012, 2068.1501, 2246.5527, 1869.6311, 1887.3821, 1821.3347, 1945.567, 2135.8853, 2046.6273, 1868.5393]
2025-09-12 11:43:57,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:43:57,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 16 hours, 21 minutes, 31 seconds)
2025-09-12 11:56:02,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:56:02,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:00:50,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1948.11035 ± 85.380
2025-09-12 12:00:50,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1817.2992, 2011.1852, 2065.6519, 1907.2955, 1975.62, 1845.4376, 2086.0464, 1971.1808, 1922.7278, 1878.6583]
2025-09-12 12:00:50,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:00:50,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 16 hours, 3 minutes, 56 seconds)
2025-09-12 12:12:54,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:12:54,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:17:42,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2025.41992 ± 148.581
2025-09-12 12:17:42,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2102.077, 1823.1469, 1927.356, 1841.9991, 2130.4045, 1899.3116, 2223.641, 1965.8799, 2072.6328, 2267.749]
2025-09-12 12:17:42,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:17:42,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 46 minutes, 20 seconds)
2025-09-12 12:29:46,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:29:46,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:34:36,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1962.13477 ± 143.236
2025-09-12 12:34:36,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1909.3828, 1818.4812, 1730.6132, 1874.5255, 2118.0056, 2120.6055, 1810.6624, 2029.8641, 2066.3, 2142.9055]
2025-09-12 12:34:36,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:34:36,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 29 minutes, 12 seconds)
2025-09-12 12:46:41,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:46:41,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:51:31,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2083.56714 ± 110.233
2025-09-12 12:51:31,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2191.5022, 2007.0491, 2030.7117, 2146.1917, 2272.2397, 1974.6403, 1946.1713, 2042.598, 2229.8088, 1994.7571]
2025-09-12 12:51:31,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:51:31,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2083.57) for latency MM1Queue_a033_s075
2025-09-12 12:51:31,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 15 hours, 12 minutes, 22 seconds)
2025-09-12 13:03:37,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:03:37,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:08:26,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2019.40881 ± 119.493
2025-09-12 13:08:26,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2186.2478, 1860.5707, 2148.1567, 2001.579, 2071.565, 2121.0427, 1839.381, 2116.2205, 1929.1423, 1920.1824]
2025-09-12 13:08:26,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:08:26,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 55 minutes, 31 seconds)
2025-09-12 13:20:32,546 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:20:32,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:25:20,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1978.32458 ± 390.901
2025-09-12 13:25:20,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2053.3027, 2187.4766, 2223.0435, 1992.2294, 2144.0117, 2063.3918, 1885.0754, 2119.11, 2265.5122, 850.09375]
2025-09-12 13:25:20,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:25:20,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 38 minutes, 48 seconds)
2025-09-12 13:37:25,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:37:25,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:42:14,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1951.26343 ± 262.011
2025-09-12 13:42:14,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1252.0338, 2074.5212, 1956.4733, 2110.4924, 1950.4258, 2084.3103, 1930.2704, 2290.5955, 1824.628, 2038.8837]
2025-09-12 13:42:14,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:42:14,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 22 minutes, 18 seconds)
2025-09-12 13:54:19,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:54:19,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:59:10,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2086.63159 ± 157.548
2025-09-12 13:59:10,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2156.3513, 2352.187, 1832.5193, 2099.7969, 2241.026, 2120.1084, 1948.3046, 2221.886, 2005.1444, 1888.992]
2025-09-12 13:59:10,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:59:10,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2086.63) for latency MM1Queue_a033_s075
2025-09-12 13:59:10,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 5 minutes, 36 seconds)
2025-09-12 14:11:16,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:11:16,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:16:03,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2061.76685 ± 187.435
2025-09-12 14:16:03,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2358.077, 1782.5486, 2197.474, 2191.9663, 2212.0068, 1897.5043, 2182.3862, 1919.3723, 1805.396, 2070.9353]
2025-09-12 14:16:03,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:16:03,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 48 minutes, 27 seconds)
2025-09-12 14:28:10,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:28:10,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:32:59,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1776.66467 ± 488.959
2025-09-12 14:32:59,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2158.4312, 922.9075, 2085.5557, 849.18695, 2092.0662, 1436.9331, 2116.8633, 2045.7369, 1907.6788, 2151.2874]
2025-09-12 14:32:59,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:32:59,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 31 minutes, 44 seconds)
2025-09-12 14:45:06,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:45:06,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:49:54,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2189.82764 ± 147.798
2025-09-12 14:49:54,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2357.7197, 2336.0808, 2223.6738, 2142.2173, 2069.3516, 1990.658, 2307.338, 1951.6062, 2384.1428, 2135.4885]
2025-09-12 14:49:54,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:49:54,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2189.83) for latency MM1Queue_a033_s075
2025-09-12 14:49:54,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 14 minutes, 57 seconds)
2025-09-12 15:02:00,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:02:00,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:06:51,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2136.64819 ± 108.431
2025-09-12 15:06:51,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1991.3682, 1989.4684, 2178.5967, 2311.935, 2129.3938, 2270.5625, 2249.0312, 2039.7394, 2097.092, 2109.2961]
2025-09-12 15:06:51,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:06:51,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 58 minutes, 28 seconds)
2025-09-12 15:18:57,755 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:18:57,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:23:45,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2059.23291 ± 247.128
2025-09-12 15:23:45,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2086.329, 2179.171, 2296.7585, 2167.0166, 2061.347, 2120.958, 1943.8512, 2221.2405, 2146.7427, 1368.9163]
2025-09-12 15:23:45,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:23:45,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 41 minutes, 17 seconds)
2025-09-12 15:35:51,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:35:51,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:40:38,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2123.75806 ± 268.497
2025-09-12 15:40:38,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2159.3699, 2277.6013, 2283.165, 2274.4517, 2017.5776, 1366.7661, 2225.6846, 2072.4014, 2252.7397, 2307.8228]
2025-09-12 15:40:38,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:40:38,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 24 minutes, 18 seconds)
2025-09-12 15:52:45,203 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:52:45,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:57:35,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2158.77295 ± 129.028
2025-09-12 15:57:35,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2215.35, 2063.3586, 2007.2267, 2296.2837, 2281.519, 2160.8389, 2143.5044, 1893.166, 2311.3203, 2215.1648]
2025-09-12 15:57:35,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:57:35,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 7 minutes, 25 seconds)
2025-09-12 16:09:41,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:09:41,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:14:33,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2081.81909 ± 138.012
2025-09-12 16:14:33,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1976.7396, 2238.8079, 1952.9471, 2365.32, 2075.9758, 2084.013, 1969.0098, 2197.9526, 2050.452, 1906.9742]
2025-09-12 16:14:33,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:14:33,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 50 minutes, 59 seconds)
2025-09-12 16:26:38,519 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:26:38,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:31:27,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2138.22388 ± 150.777
2025-09-12 16:31:27,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1969.9521, 2051.4531, 2233.0095, 2355.2795, 2370.2288, 2039.0475, 2245.2954, 2120.1233, 1896.1339, 2101.7134]
2025-09-12 16:31:27,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:31:27,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 33 minutes, 41 seconds)
2025-09-12 16:43:31,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:43:31,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:48:22,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2168.14844 ± 132.590
2025-09-12 16:48:22,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2212.8472, 2129.7754, 2189.9043, 1913.4618, 2093.3154, 2152.1746, 2015.1934, 2366.9727, 2278.0183, 2329.8223]
2025-09-12 16:48:22,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:48:22,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 16 minutes, 52 seconds)
2025-09-12 17:00:28,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:00:28,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:05:16,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2234.77661 ± 66.245
2025-09-12 17:05:16,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2185.7847, 2135.3213, 2368.042, 2243.4934, 2188.9263, 2211.5933, 2252.3655, 2197.1426, 2331.6736, 2233.4233]
2025-09-12 17:05:16,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:05:16,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2234.78) for latency MM1Queue_a033_s075
2025-09-12 17:05:16,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 9 seconds)
2025-09-12 17:17:22,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:17:22,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:22:07,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2142.08911 ± 144.254
2025-09-12 17:22:07,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2182.526, 1833.5957, 2323.9773, 2248.147, 2226.7168, 1955.9143, 2132.0674, 2052.624, 2215.0144, 2250.3086]
2025-09-12 17:22:07,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:22:07,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 42 minutes, 34 seconds)
2025-09-12 17:34:14,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:34:14,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:39:03,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2069.46240 ± 91.641
2025-09-12 17:39:03,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2147.2375, 1932.4783, 2091.5032, 2020.363, 2184.6777, 2120.2869, 2014.4135, 1921.8008, 2067.6006, 2194.2625]
2025-09-12 17:39:03,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:39:03,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 25 minutes, 21 seconds)
2025-09-12 17:51:09,484 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:51:09,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:55:55,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2280.60571 ± 174.361
2025-09-12 17:55:55,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2093.05, 2188.7, 2089.2285, 2165.5894, 2492.6458, 2530.4688, 2104.9553, 2310.1326, 2283.4465, 2547.8396]
2025-09-12 17:55:55,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:55:55,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2280.61) for latency MM1Queue_a033_s075
2025-09-12 17:55:55,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 8 minutes, 13 seconds)
2025-09-12 18:08:02,739 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:08:02,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:12:51,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2263.30396 ± 120.680
2025-09-12 18:12:51,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2278.0654, 2133.6362, 2273.8938, 2253.3967, 2526.997, 2199.1494, 2382.8755, 2266.2524, 2256.4482, 2062.325]
2025-09-12 18:12:51,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:12:51,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 51 minutes, 22 seconds)
2025-09-12 18:24:57,106 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:24:57,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:29:48,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2214.14404 ± 125.930
2025-09-12 18:29:48,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2125.9883, 2469.4246, 2246.3384, 2267.6362, 1982.4132, 2264.37, 2120.7036, 2149.6938, 2190.7642, 2324.1077]
2025-09-12 18:29:48,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:29:48,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 34 minutes, 47 seconds)
2025-09-12 18:41:55,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:41:55,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:46:45,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2211.19873 ± 104.494
2025-09-12 18:46:45,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2123.1294, 2198.6846, 2136.7107, 2416.758, 2054.8418, 2351.6116, 2159.117, 2251.0989, 2160.0, 2260.0327]
2025-09-12 18:46:45,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:46:45,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 18 minutes, 33 seconds)
2025-09-12 18:58:51,351 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:58:51,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:03:37,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2216.02930 ± 172.920
2025-09-12 19:03:37,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2553.2856, 2164.6836, 2158.3684, 2221.5955, 2095.1602, 2036.4729, 2086.6694, 2536.6924, 2204.163, 2103.204]
2025-09-12 19:03:37,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:03:37,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 1 minute, 15 seconds)
2025-09-12 19:15:45,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:15:45,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:20:32,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2195.93506 ± 75.988
2025-09-12 19:20:32,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2067.9272, 2129.191, 2115.4006, 2255.233, 2273.0332, 2233.6914, 2317.8003, 2140.1328, 2235.264, 2191.6765]
2025-09-12 19:20:32,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:20:32,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 44 minutes, 35 seconds)
2025-09-12 19:32:40,970 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:32:40,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:37:30,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2262.58423 ± 125.085
2025-09-12 19:37:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2185.706, 2095.6594, 2471.989, 2336.2263, 2159.1235, 2192.2532, 2154.6743, 2413.091, 2406.6394, 2210.4802]
2025-09-12 19:37:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:37:30,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 27 minutes, 51 seconds)
2025-09-12 19:49:37,831 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:49:37,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:54:28,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2287.37329 ± 113.982
2025-09-12 19:54:28,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2374.412, 2354.939, 2360.2004, 2065.2122, 2301.023, 2256.9573, 2408.2695, 2139.3003, 2417.8818, 2195.5369]
2025-09-12 19:54:28,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:54:28,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2287.37) for latency MM1Queue_a033_s075
2025-09-12 19:54:28,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 11 minutes, 3 seconds)
2025-09-12 20:06:36,785 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:06:36,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:11:26,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2214.09814 ± 146.390
2025-09-12 20:11:26,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2397.0225, 2291.463, 1947.3049, 2248.7715, 2408.2932, 2139.0688, 2140.2146, 2038.3632, 2171.5415, 2358.9387]
2025-09-12 20:11:26,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:11:26,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 54 minutes, 9 seconds)
2025-09-12 20:23:33,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:23:33,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:28:23,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2278.40259 ± 121.460
2025-09-12 20:28:23,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2177.1172, 2142.9194, 2287.2563, 2295.4192, 2199.61, 2484.233, 2196.9946, 2382.7341, 2467.9297, 2149.811]
2025-09-12 20:28:23,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:28:23,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 37 minutes, 40 seconds)
2025-09-12 20:40:30,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:40:30,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:45:18,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2315.02344 ± 196.580
2025-09-12 20:45:18,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2359.0046, 1857.476, 2348.328, 2185.911, 2324.0076, 2189.7546, 2445.0857, 2586.5637, 2313.3616, 2540.7432]
2025-09-12 20:45:18,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:45:18,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2315.02) for latency MM1Queue_a033_s075
2025-09-12 20:45:18,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 20 minutes, 44 seconds)
2025-09-12 20:57:26,115 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:57:26,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:02:12,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2081.62378 ± 568.573
2025-09-12 21:02:12,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2177.787, 417.2325, 2403.5842, 2458.9846, 2212.9429, 2180.2446, 2036.5348, 2197.6042, 2324.2424, 2407.0828]
2025-09-12 21:02:12,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:02:12,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 3 minutes, 34 seconds)
2025-09-12 21:14:20,872 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:14:20,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:19:07,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2248.67432 ± 154.130
2025-09-12 21:19:07,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2120.5786, 2271.7175, 2222.0532, 2366.7468, 2306.0474, 2014.7687, 2111.667, 2526.2988, 2113.9106, 2432.9556]
2025-09-12 21:19:07,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:19:08,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 46 minutes, 21 seconds)
2025-09-12 21:31:16,046 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:31:16,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:36:04,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2371.02026 ± 163.835
2025-09-12 21:36:04,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2246.5308, 2675.8328, 2277.9834, 2426.8267, 2512.8176, 2129.6084, 2207.4807, 2280.0837, 2555.8267, 2397.2136]
2025-09-12 21:36:04,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:36:04,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2371.02) for latency MM1Queue_a033_s075
2025-09-12 21:36:04,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 29 minutes, 19 seconds)
2025-09-12 21:48:12,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:48:12,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:52:59,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2224.38525 ± 215.390
2025-09-12 21:52:59,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2152.138, 2391.6157, 2219.44, 1663.6791, 2332.8784, 2364.6965, 2495.2964, 2289.269, 2147.7107, 2187.1296]
2025-09-12 21:52:59,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:52:59,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 12 minutes, 14 seconds)
2025-09-12 22:05:06,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:05:06,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:09:57,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2375.45166 ± 119.376
2025-09-12 22:09:57,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2418.6423, 2074.7754, 2455.6157, 2393.9976, 2238.5693, 2475.6216, 2369.6157, 2432.4216, 2469.9434, 2425.3135]
2025-09-12 22:09:57,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:09:57,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2375.45) for latency MM1Queue_a033_s075
2025-09-12 22:09:57,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 55 minutes, 33 seconds)
2025-09-12 22:22:04,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:22:04,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:26:51,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2417.22314 ± 78.491
2025-09-12 22:26:51,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2305.9243, 2409.1118, 2377.94, 2513.796, 2596.636, 2350.8728, 2381.4243, 2412.656, 2398.9656, 2424.9065]
2025-09-12 22:26:51,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:26:51,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2417.22) for latency MM1Queue_a033_s075
2025-09-12 22:26:51,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 38 minutes, 35 seconds)
2025-09-12 22:38:59,753 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:38:59,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:43:46,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2285.08545 ± 153.202
2025-09-12 22:43:46,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2376.7617, 2473.3635, 2309.242, 2008.0149, 2059.8123, 2352.2058, 2419.0168, 2325.6704, 2400.725, 2126.0452]
2025-09-12 22:43:46,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:43:46,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 21 minutes, 39 seconds)
2025-09-12 22:55:54,420 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:55:54,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:00:41,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2287.96753 ± 321.783
2025-09-12 23:00:41,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2626.2, 2383.5151, 1411.7745, 2318.926, 2389.8706, 2270.4382, 2095.3113, 2444.0386, 2519.5308, 2420.0703]
2025-09-12 23:00:41,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:00:41,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 4 minutes, 37 seconds)
2025-09-12 23:12:47,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:12:47,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:17:37,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2335.37451 ± 167.483
2025-09-12 23:17:37,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2017.5332, 2097.9246, 2238.6636, 2329.6758, 2344.952, 2471.2761, 2486.0706, 2587.9478, 2426.6912, 2353.009]
2025-09-12 23:17:37,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:17:37,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 47 minutes, 46 seconds)
2025-09-12 23:29:46,167 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:29:46,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:34:34,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2121.77100 ± 533.748
2025-09-12 23:34:34,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2211.3936, 2345.415, 2276.3667, 2227.941, 2432.1711, 2334.7664, 541.42505, 2147.7385, 2274.8662, 2425.6262]
2025-09-12 23:34:34,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:34:34,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 30 minutes, 45 seconds)
2025-09-12 23:46:42,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:46:42,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:51:32,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2361.69702 ± 145.593
2025-09-12 23:51:32,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2189.7742, 2295.8303, 2516.728, 2343.1807, 2409.5088, 2087.618, 2593.9736, 2298.9897, 2386.2268, 2495.1404]
2025-09-12 23:51:32,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:51:32,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 14 minutes, 1 second)
2025-09-13 00:03:40,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:03:40,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:08:31,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2453.68848 ± 151.685
2025-09-13 00:08:31,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2764.3708, 2512.5264, 2283.6455, 2333.0469, 2364.7864, 2479.6545, 2358.5952, 2325.9224, 2682.2073, 2432.1284]
2025-09-13 00:08:31,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:08:31,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2453.69) for latency MM1Queue_a033_s075
2025-09-13 00:08:31,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 57 minutes, 16 seconds)
2025-09-13 00:20:40,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:20:40,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:25:27,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2412.34375 ± 140.992
2025-09-13 00:25:27,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2709.9622, 2417.4019, 2178.1064, 2369.934, 2499.2747, 2329.8838, 2386.8347, 2492.4666, 2489.0964, 2250.4792]
2025-09-13 00:25:27,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:25:27,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 40 minutes, 25 seconds)
2025-09-13 00:37:37,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:37:37,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:42:23,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2408.63672 ± 143.551
2025-09-13 00:42:23,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2269.2957, 2515.9956, 2299.9001, 2345.6196, 2674.5183, 2188.83, 2299.6567, 2552.7468, 2460.5627, 2479.2412]
2025-09-13 00:42:23,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:42:23,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 23 minutes, 26 seconds)
2025-09-13 00:54:30,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:54:30,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:59:18,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2415.82935 ± 176.749
2025-09-13 00:59:18,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2301.8386, 2549.245, 2416.4133, 2232.2761, 2173.7388, 2470.6475, 2164.3496, 2615.221, 2649.9905, 2584.5725]
2025-09-13 00:59:18,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:59:18,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 6 minutes, 25 seconds)
2025-09-13 01:11:28,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:11:28,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:16:18,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2483.27319 ± 184.443
2025-09-13 01:16:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2515.0327, 2488.284, 2690.238, 2440.0457, 2195.169, 2382.9114, 2281.7925, 2622.519, 2373.7476, 2842.9915]
2025-09-13 01:16:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:16:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2483.27) for latency MM1Queue_a033_s075
2025-09-13 01:16:18,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 49 minutes, 33 seconds)
2025-09-13 01:28:28,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:28:28,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:33:17,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2438.49951 ± 110.554
2025-09-13 01:33:17,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2518.5095, 2471.5703, 2576.3584, 2389.7253, 2314.8838, 2307.8352, 2513.5835, 2282.9297, 2610.5273, 2399.0715]
2025-09-13 01:33:17,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:33:17,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 32 minutes, 35 seconds)
2025-09-13 01:45:25,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:45:25,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:50:15,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2525.27905 ± 126.202
2025-09-13 01:50:15,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2581.4468, 2511.5723, 2546.117, 2354.8381, 2351.5806, 2724.8564, 2464.8135, 2746.7434, 2502.1348, 2468.6877]
2025-09-13 01:50:15,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:50:15,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2525.28) for latency MM1Queue_a033_s075
2025-09-13 01:50:15,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 15 minutes, 40 seconds)
2025-09-13 02:02:23,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:02:23,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:07:14,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2471.55957 ± 137.094
2025-09-13 02:07:14,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2387.564, 2490.388, 2625.2505, 2432.8394, 2486.4136, 2283.7317, 2210.4045, 2615.085, 2624.5525, 2559.3672]
2025-09-13 02:07:14,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:07:14,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 58 minutes, 46 seconds)
2025-09-13 02:19:22,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:19:22,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:24:11,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2348.33740 ± 135.507
2025-09-13 02:24:11,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2367.4172, 2323.9177, 2486.685, 2128.875, 2419.6628, 2486.04, 2074.3306, 2327.6824, 2412.074, 2456.6902]
2025-09-13 02:24:11,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:24:11,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 41 minutes, 51 seconds)
2025-09-13 02:36:20,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:36:20,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:41:10,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2032.98755 ± 792.683
2025-09-13 02:41:10,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2364.1182, 2376.2227, 526.689, 2368.7925, 2595.1577, 2037.2694, 2372.504, 448.03415, 2762.7808, 2478.3066]
2025-09-13 02:41:10,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:41:10,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 24 minutes, 51 seconds)
2025-09-13 02:53:18,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:53:18,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:58:07,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2283.71484 ± 529.482
2025-09-13 02:58:07,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2349.437, 2209.3354, 2525.8076, 729.8676, 2533.3115, 2387.345, 2606.3857, 2540.6606, 2511.6138, 2443.3857]
2025-09-13 02:58:07,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:58:07,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 7 minutes, 51 seconds)
2025-09-13 03:10:15,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:10:15,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:15:06,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2338.88062 ± 295.186
2025-09-13 03:15:06,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1732.5502, 1967.5349, 2750.0288, 2511.1936, 2531.8027, 2248.5847, 2505.7512, 2495.275, 2508.6484, 2137.4358]
2025-09-13 03:15:06,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:15:06,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 50 minutes, 54 seconds)
2025-09-13 03:27:15,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:27:15,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:32:06,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2509.87817 ± 194.853
2025-09-13 03:32:06,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2475.9277, 2902.1235, 2526.773, 2384.3398, 2782.6372, 2296.6929, 2485.9668, 2475.1084, 2551.3547, 2217.8557]
2025-09-13 03:32:06,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:32:06,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 56 seconds)
2025-09-13 03:44:16,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:44:16,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:49:02,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2547.28003 ± 106.750
2025-09-13 03:49:02,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2340.8918, 2499.1108, 2479.271, 2632.7258, 2623.019, 2751.5906, 2557.2588, 2573.7522, 2454.288, 2560.8916]
2025-09-13 03:49:02,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:49:02,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2547.28) for latency MM1Queue_a033_s075
2025-09-13 03:49:02,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 58 seconds)
2025-09-13 04:01:11,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:01:11,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:05:58,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2466.82544 ± 122.938
2025-09-13 04:05:58,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2574.8972, 2327.0027, 2621.9663, 2191.2158, 2481.6868, 2484.1396, 2452.9348, 2516.2246, 2428.0012, 2590.1865]
2025-09-13 04:05:58,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:05:59,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1251 [DEBUG]: Training session finished
