2025-09-12 03:11:10,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc20-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:11:10,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc20-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:11:10,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14dd21bbef10>}
2025-09-12 03:11:10,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1111 [DEBUG]: using device: cuda
2025-09-12 03:11:10,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1133 [INFO]: Creating new trainer
2025-09-12 03:11:10,852 baseline-mbpac-noiseperc20-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 03:11:10,853 baseline-mbpac-noiseperc20-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 03:11:10,860 baseline-mbpac-noiseperc20-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 03:11:13,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1194 [DEBUG]: Starting training session...
2025-09-12 03:11:13,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 1/100
2025-09-12 03:22:24,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:22:24,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:23:26,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 271.75201 ± 100.522
2025-09-12 03:23:26,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [250.23111, 278.64307, 272.03983, 363.8561, 55.99962, 175.6762, 250.98645, 453.9464, 305.91235, 310.22913]
2025-09-12 03:23:26,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 160.0, 172.0, 244.0, 177.0, 319.0, 142.0, 325.0, 187.0, 194.0]
2025-09-12 03:23:26,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (271.75) for latency MM1Queue_a033_s075
2025-09-12 03:23:26,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 10 minutes, 24 seconds)
2025-09-12 03:36:17,659 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:36:17,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:37:05,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 121.05287 ± 134.775
2025-09-12 03:37:05,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [390.8688, 277.40466, 35.166687, 3.7838604, 123.238686, 16.780579, 27.54291, 20.60475, 281.8445, 33.293163]
2025-09-12 03:37:05,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [261.0, 195.0, 82.0, 98.0, 229.0, 146.0, 129.0, 99.0, 217.0, 106.0]
2025-09-12 03:37:05,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 21 hours, 7 minutes, 18 seconds)
2025-09-12 03:49:44,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:49:44,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:50:18,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 101.87398 ± 98.503
2025-09-12 03:50:18,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [28.815832, 63.633904, 255.31, 150.2785, -4.838225, 14.493512, 141.25195, 65.011925, 15.814644, 288.96777]
2025-09-12 03:50:18,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [109.0, 96.0, 135.0, 113.0, 111.0, 115.0, 94.0, 105.0, 29.0, 197.0]
2025-09-12 03:50:18,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 21 hours, 3 minutes, 47 seconds)
2025-09-12 04:03:01,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:03:01,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:03:55,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 184.82921 ± 144.395
2025-09-12 04:03:55,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [20.900642, 261.11285, 25.175232, 62.329845, 407.0646, 213.85837, 305.1698, 88.459755, 400.69427, 63.52668]
2025-09-12 04:03:55,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 158.0, 33.0, 93.0, 221.0, 259.0, 179.0, 257.0, 264.0, 109.0]
2025-09-12 04:03:55,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 21 hours, 5 minutes, 2 seconds)
2025-09-12 04:16:30,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:16:30,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:17:15,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 121.78855 ± 119.511
2025-09-12 04:17:15,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [46.105705, 379.1638, -38.48051, 19.866554, 186.86401, 81.95464, 195.98248, 6.1077313, 118.212776, 222.10828]
2025-09-12 04:17:15,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 220.0, 202.0, 37.0, 105.0, 131.0, 300.0, 16.0, 178.0, 132.0]
2025-09-12 04:17:15,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 20 hours, 54 minutes, 51 seconds)
2025-09-12 04:30:06,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:30:06,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:30:56,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 276.03534 ± 142.646
2025-09-12 04:30:56,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [61.077156, 4.9937134, 433.81824, 348.45648, 369.4473, 335.69162, 372.17606, 132.64601, 338.12875, 363.91806]
2025-09-12 04:30:56,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 15.0, 315.0, 164.0, 181.0, 178.0, 188.0, 144.0, 184.0, 177.0]
2025-09-12 04:30:56,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (276.04) for latency MM1Queue_a033_s075
2025-09-12 04:30:56,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 9 minutes)
2025-09-12 04:43:35,510 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:43:35,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:44:28,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 285.18924 ± 140.476
2025-09-12 04:44:28,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [296.39166, 31.403757, 362.75128, 370.33978, 446.34503, 398.12192, 102.80273, 104.20824, 354.16806, 385.36017]
2025-09-12 04:44:28,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 99.0, 208.0, 193.0, 242.0, 199.0, 155.0, 137.0, 180.0, 190.0]
2025-09-12 04:44:28,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (285.19) for latency MM1Queue_a033_s075
2025-09-12 04:44:28,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 20 hours, 53 minutes, 32 seconds)
2025-09-12 04:57:16,430 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:57:16,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:58:20,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 342.45593 ± 86.903
2025-09-12 04:58:20,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [359.86987, 438.061, 136.46114, 346.04797, 415.24942, 351.71716, 372.04266, 233.35301, 414.9963, 356.76077]
2025-09-12 04:58:20,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 239.0, 190.0, 195.0, 261.0, 169.0, 198.0, 262.0, 199.0, 174.0]
2025-09-12 04:58:20,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (342.46) for latency MM1Queue_a033_s075
2025-09-12 04:58:20,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 51 minutes, 48 seconds)
2025-09-12 05:11:11,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:11:11,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:12:17,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 399.58258 ± 49.155
2025-09-12 05:12:17,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [412.8097, 307.08252, 336.00983, 448.6868, 417.75156, 473.54944, 441.34613, 406.44836, 387.9008, 364.24036]
2025-09-12 05:12:17,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 153.0, 170.0, 284.0, 210.0, 270.0, 239.0, 241.0, 191.0, 202.0]
2025-09-12 05:12:17,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (399.58) for latency MM1Queue_a033_s075
2025-09-12 05:12:17,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 20 hours, 44 minutes, 2 seconds)
2025-09-12 05:25:01,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:25:01,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:26:02,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 284.30890 ± 104.976
2025-09-12 05:26:02,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [308.26953, 84.75399, 396.57355, 243.56938, 321.1545, 379.82938, 353.869, 219.47548, 400.97162, 134.62265]
2025-09-12 05:26:02,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 180.0, 192.0, 137.0, 178.0, 202.0, 192.0, 281.0, 249.0, 186.0]
2025-09-12 05:26:02,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 20 hours, 37 minutes, 59 seconds)
2025-09-12 05:38:41,928 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:38:41,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:39:40,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 377.82843 ± 93.386
2025-09-12 05:39:40,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [420.21167, 435.00888, 362.801, 338.28397, 152.3699, 357.69247, 506.62494, 442.99533, 445.41187, 316.88403]
2025-09-12 05:39:40,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [216.0, 214.0, 218.0, 186.0, 98.0, 164.0, 226.0, 224.0, 239.0, 166.0]
2025-09-12 05:39:40,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 20 hours, 23 minutes, 24 seconds)
2025-09-12 05:52:26,098 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:52:26,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:53:38,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 409.71063 ± 160.539
2025-09-12 05:53:38,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [277.89902, 306.6665, 376.18674, 64.997856, 444.25266, 542.5134, 694.75714, 487.693, 461.11444, 441.02567]
2025-09-12 05:53:38,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 179.0, 231.0, 78.0, 239.0, 284.0, 453.0, 276.0, 210.0, 261.0]
2025-09-12 05:53:38,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (409.71) for latency MM1Queue_a033_s075
2025-09-12 05:53:38,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 20 hours, 17 minutes, 15 seconds)
2025-09-12 06:06:13,897 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:06:13,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:07:09,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 299.06732 ± 164.257
2025-09-12 06:07:09,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [291.97845, 9.853464, 21.333452, 364.9774, 472.05713, 379.1109, 498.4825, 429.86148, 195.93181, 327.08664]
2025-09-12 06:07:09,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 17.0, 39.0, 169.0, 252.0, 221.0, 300.0, 221.0, 243.0, 164.0]
2025-09-12 06:07:09,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 19 hours, 57 minutes, 17 seconds)
2025-09-12 06:20:03,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:20:03,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:21:07,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 374.68759 ± 126.176
2025-09-12 06:21:07,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [153.47638, 266.3449, 616.3384, 360.3257, 363.03644, 477.3195, 477.93475, 265.5583, 326.07306, 440.46863]
2025-09-12 06:21:07,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 145.0, 293.0, 196.0, 204.0, 276.0, 219.0, 154.0, 169.0, 233.0]
2025-09-12 06:21:07,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 44 minutes, 11 seconds)
2025-09-12 06:33:43,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:33:43,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:34:40,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 331.32370 ± 164.832
2025-09-12 06:34:40,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [400.47958, 500.37338, 428.67203, 295.3276, 502.02145, 373.78058, 412.4565, 3.0577338, 45.210384, 351.8577]
2025-09-12 06:34:40,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 259.0, 250.0, 153.0, 309.0, 207.0, 205.0, 12.0, 62.0, 199.0]
2025-09-12 06:34:40,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 19 hours, 26 minutes, 49 seconds)
2025-09-12 06:47:38,991 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:47:38,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:48:58,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 475.76562 ± 107.278
2025-09-12 06:48:58,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [348.6099, 640.8903, 411.4538, 499.2267, 478.55096, 653.7061, 482.3832, 329.1791, 376.06195, 537.59406]
2025-09-12 06:48:58,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 394.0, 208.0, 262.0, 327.0, 305.0, 222.0, 158.0, 209.0, 288.0]
2025-09-12 06:48:58,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (475.77) for latency MM1Queue_a033_s075
2025-09-12 06:48:58,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 19 hours, 24 minutes, 6 seconds)
2025-09-12 07:01:49,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:01:49,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:02:48,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 371.73044 ± 142.793
2025-09-12 07:02:48,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [374.55963, 351.47357, 487.1783, 483.31876, 0.40310526, 467.70883, 525.04266, 351.8032, 384.89474, 290.92166]
2025-09-12 07:02:48,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 190.0, 258.0, 243.0, 11.0, 235.0, 260.0, 190.0, 198.0, 152.0]
2025-09-12 07:02:48,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 19 hours, 8 minutes, 5 seconds)
2025-09-12 07:15:28,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:15:28,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:16:37,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 439.02280 ± 140.879
2025-09-12 07:16:37,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [601.0042, 116.54951, 478.5709, 544.81116, 402.07214, 451.3008, 485.42285, 485.3055, 257.22583, 567.9649]
2025-09-12 07:16:37,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [295.0, 153.0, 243.0, 274.0, 223.0, 205.0, 231.0, 217.0, 128.0, 284.0]
2025-09-12 07:16:37,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 59 minutes, 20 seconds)
2025-09-12 07:29:24,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:29:24,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:30:25,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 361.73193 ± 137.678
2025-09-12 07:30:25,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [346.79062, 297.33392, 391.3287, -0.2932919, 399.19153, 522.26337, 387.09357, 355.89542, 515.9764, 401.7392]
2025-09-12 07:30:25,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 168.0, 245.0, 12.0, 192.0, 309.0, 208.0, 180.0, 286.0, 221.0]
2025-09-12 07:30:25,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 42 minutes, 31 seconds)
2025-09-12 07:43:08,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:43:08,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:44:17,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 434.39111 ± 88.036
2025-09-12 07:44:17,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [378.69376, 430.77805, 426.41412, 536.6064, 360.00677, 555.4154, 300.06168, 331.1268, 474.8798, 549.92804]
2025-09-12 07:44:17,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 218.0, 245.0, 257.0, 179.0, 294.0, 169.0, 175.0, 243.0, 322.0]
2025-09-12 07:44:17,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 33 minutes, 48 seconds)
2025-09-12 07:56:57,852 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:56:57,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:58:02,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 399.88809 ± 129.243
2025-09-12 07:58:02,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [278.04968, 467.9369, 410.8718, 419.65234, 585.8421, 483.50058, 313.67557, 102.25811, 464.28143, 472.81226]
2025-09-12 07:58:02,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 208.0, 203.0, 209.0, 287.0, 235.0, 167.0, 161.0, 279.0, 237.0]
2025-09-12 07:58:02,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 11 minutes, 18 seconds)
2025-09-12 08:10:58,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:10:58,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:12:06,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 360.42059 ± 155.801
2025-09-12 08:12:06,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [412.12646, 560.01105, 329.20413, 212.48244, 606.6882, 52.059143, 476.08228, 359.60175, 313.11664, 282.8339]
2025-09-12 08:12:06,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [308.0, 258.0, 166.0, 121.0, 366.0, 78.0, 307.0, 190.0, 266.0, 185.0]
2025-09-12 08:12:06,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 1 minute, 3 seconds)
2025-09-12 08:24:57,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:24:57,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:25:51,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 345.56766 ± 148.266
2025-09-12 08:25:51,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [450.2321, 9.709337, 181.96356, 252.48317, 493.9645, 488.44077, 350.1667, 471.6166, 386.4095, 370.69052]
2025-09-12 08:25:51,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 20.0, 106.0, 123.0, 227.0, 231.0, 196.0, 212.0, 223.0, 201.0]
2025-09-12 08:25:51,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 17 hours, 46 minutes, 15 seconds)
2025-09-12 08:38:19,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:38:19,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:39:40,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 566.05585 ± 124.914
2025-09-12 08:39:40,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [531.35986, 688.71606, 684.22925, 521.8063, 564.4607, 399.67398, 719.49615, 325.98178, 685.55615, 539.279]
2025-09-12 08:39:40,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [243.0, 286.0, 273.0, 234.0, 338.0, 201.0, 302.0, 165.0, 352.0, 248.0]
2025-09-12 08:39:40,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (566.06) for latency MM1Queue_a033_s075
2025-09-12 08:39:40,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 32 minutes, 28 seconds)
2025-09-12 08:52:24,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:52:24,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:53:27,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 466.17227 ± 219.954
2025-09-12 08:53:27,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [172.90323, 443.15576, 557.2926, 383.94376, 21.888168, 733.82184, 443.01102, 572.98364, 769.02, 563.7026]
2025-09-12 08:53:27,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 204.0, 240.0, 180.0, 28.0, 325.0, 238.0, 234.0, 300.0, 219.0]
2025-09-12 08:53:27,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 17 minutes, 32 seconds)
2025-09-12 09:06:31,183 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:06:31,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:07:36,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 482.60529 ± 107.954
2025-09-12 09:07:36,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [355.02737, 717.1688, 472.3753, 503.63766, 636.1867, 433.09958, 425.59085, 373.36133, 485.8522, 423.75333]
2025-09-12 09:07:36,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 333.0, 208.0, 212.0, 230.0, 182.0, 185.0, 167.0, 204.0, 210.0]
2025-09-12 09:07:36,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 9 minutes, 36 seconds)
2025-09-12 09:20:05,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:20:05,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:21:06,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 447.55331 ± 124.008
2025-09-12 09:21:06,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [191.73415, 371.07727, 493.4342, 430.49274, 458.16876, 321.0221, 660.71136, 554.18445, 464.17078, 530.53766]
2025-09-12 09:21:06,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 161.0, 197.0, 196.0, 185.0, 179.0, 283.0, 214.0, 200.0, 228.0]
2025-09-12 09:21:06,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 47 minutes, 30 seconds)
2025-09-12 09:34:15,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:34:15,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:35:24,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 539.54358 ± 90.489
2025-09-12 09:35:24,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [462.73077, 488.3367, 590.668, 480.31665, 361.49103, 690.6047, 610.49854, 627.08386, 539.8503, 543.85535]
2025-09-12 09:35:24,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 197.0, 232.0, 196.0, 228.0, 300.0, 272.0, 238.0, 205.0, 237.0]
2025-09-12 09:35:24,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 41 minutes, 27 seconds)
2025-09-12 09:47:56,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:47:56,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:48:57,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 476.00034 ± 182.003
2025-09-12 09:48:57,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [550.74896, -2.6588082, 564.5057, 654.84595, 363.68143, 575.2184, 598.509, 597.6426, 428.13797, 429.37265]
2025-09-12 09:48:57,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 10.0, 232.0, 266.0, 178.0, 221.0, 272.0, 227.0, 179.0, 206.0]
2025-09-12 09:48:57,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 24 minutes)
2025-09-12 10:01:35,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:01:35,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:02:41,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 478.80487 ± 167.545
2025-09-12 10:02:41,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [528.6773, 590.4272, 539.31805, 574.3633, 446.49792, 384.44077, 28.839344, 522.2957, 498.77017, 674.4187]
2025-09-12 10:02:41,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 215.0, 226.0, 253.0, 237.0, 178.0, 97.0, 229.0, 202.0, 284.0]
2025-09-12 10:02:41,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 9 minutes, 7 seconds)
2025-09-12 10:15:42,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:15:42,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:16:57,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 589.42212 ± 77.928
2025-09-12 10:16:57,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [467.95212, 637.399, 655.472, 660.3325, 591.5824, 441.92014, 552.07245, 633.75525, 683.2344, 570.50134]
2025-09-12 10:16:57,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 242.0, 283.0, 284.0, 281.0, 178.0, 228.0, 259.0, 293.0, 224.0]
2025-09-12 10:16:57,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (589.42) for latency MM1Queue_a033_s075
2025-09-12 10:16:57,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 15 hours, 57 minutes, 7 seconds)
2025-09-12 10:29:37,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:29:37,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:31:18,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 813.19617 ± 141.391
2025-09-12 10:31:18,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [783.8143, 832.6624, 866.8002, 684.08154, 853.5625, 588.8745, 1015.2694, 1069.4641, 681.1669, 756.26605]
2025-09-12 10:31:18,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 322.0, 354.0, 278.0, 346.0, 233.0, 445.0, 418.0, 311.0, 302.0]
2025-09-12 10:31:18,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (813.20) for latency MM1Queue_a033_s075
2025-09-12 10:31:18,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 54 minutes, 41 seconds)
2025-09-12 10:43:57,405 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:43:57,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:45:14,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 599.11334 ± 188.090
2025-09-12 10:45:14,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [682.9406, 760.2413, 694.87115, 697.6278, 56.509617, 637.5874, 647.4841, 572.2824, 652.5298, 589.05914]
2025-09-12 10:45:14,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 268.0, 244.0, 307.0, 90.0, 249.0, 292.0, 251.0, 260.0, 260.0]
2025-09-12 10:45:14,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 35 minutes, 48 seconds)
2025-09-12 10:58:01,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:58:01,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:59:36,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 699.51483 ± 294.246
2025-09-12 10:59:36,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1360.5781, 643.57654, 124.446915, 588.89233, 875.7828, 573.3642, 724.4031, 603.64874, 648.26685, 852.1893]
2025-09-12 10:59:36,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [588.0, 249.0, 193.0, 248.0, 384.0, 214.0, 321.0, 255.0, 285.0, 326.0]
2025-09-12 10:59:36,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 32 minutes, 26 seconds)
2025-09-12 11:12:36,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:12:36,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:14:01,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 681.72327 ± 88.543
2025-09-12 11:14:01,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [585.7948, 749.73444, 636.71857, 620.7371, 815.83875, 646.247, 691.17535, 564.46295, 841.31226, 665.2114]
2025-09-12 11:14:01,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 268.0, 243.0, 239.0, 323.0, 262.0, 292.0, 218.0, 427.0, 249.0]
2025-09-12 11:14:01,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 27 minutes, 21 seconds)
2025-09-12 11:26:31,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:26:31,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:28:02,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 757.57007 ± 138.636
2025-09-12 11:28:02,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [532.6419, 802.93097, 859.6656, 832.2192, 673.3752, 705.97394, 646.8567, 616.7914, 1005.70447, 899.54144]
2025-09-12 11:28:02,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [250.0, 300.0, 338.0, 304.0, 237.0, 283.0, 240.0, 233.0, 384.0, 370.0]
2025-09-12 11:28:02,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 9 minutes, 40 seconds)
2025-09-12 11:41:00,429 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:41:00,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:42:28,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 687.10486 ± 215.746
2025-09-12 11:42:28,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [798.42664, 663.8681, 581.25934, 720.6102, 726.6549, 776.1952, 122.786606, 716.19904, 1014.8613, 750.18695]
2025-09-12 11:42:28,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [350.0, 269.0, 222.0, 251.0, 285.0, 304.0, 203.0, 316.0, 413.0, 272.0]
2025-09-12 11:42:28,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 56 minutes, 37 seconds)
2025-09-12 11:55:09,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:55:09,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:56:33,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 662.06372 ± 64.106
2025-09-12 11:56:33,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [651.1147, 637.12976, 674.58673, 793.9516, 573.0068, 699.6178, 736.83356, 644.53766, 582.1807, 627.67816]
2025-09-12 11:56:33,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 278.0, 309.0, 309.0, 233.0, 283.0, 285.0, 247.0, 257.0, 245.0]
2025-09-12 11:56:33,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 44 minutes, 22 seconds)
2025-09-12 12:09:23,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:09:23,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:10:40,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 602.70471 ± 243.688
2025-09-12 12:10:40,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [-0.025970757, 674.6685, 444.60815, 908.8226, 575.3081, 830.0486, 599.1289, 698.84204, 504.8015, 790.84485]
2025-09-12 12:10:40,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [10.0, 257.0, 178.0, 353.0, 225.0, 348.0, 252.0, 361.0, 186.0, 323.0]
2025-09-12 12:10:40,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 27 minutes, 9 seconds)
2025-09-12 12:23:27,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:23:27,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:25:55,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1102.15063 ± 380.484
2025-09-12 12:25:55,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [646.2416, 792.1055, 1385.4902, 1505.4281, 1617.0095, 936.5906, 770.5059, 1075.599, 649.42267, 1643.1129]
2025-09-12 12:25:55,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [229.0, 295.0, 633.0, 630.0, 702.0, 362.0, 308.0, 536.0, 299.0, 878.0]
2025-09-12 12:25:55,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1102.15) for latency MM1Queue_a033_s075
2025-09-12 12:25:55,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 22 minutes, 45 seconds)
2025-09-12 12:38:42,214 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:38:42,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:40:01,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 664.99164 ± 101.508
2025-09-12 12:40:01,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [871.87085, 474.95923, 661.2306, 727.07996, 628.02625, 775.22473, 634.3843, 608.56537, 615.1273, 653.44763]
2025-09-12 12:40:01,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [309.0, 173.0, 229.0, 314.0, 271.0, 322.0, 223.0, 235.0, 217.0, 292.0]
2025-09-12 12:40:01,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 9 minutes, 32 seconds)
2025-09-12 12:52:51,297 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:52:51,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:54:40,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 873.79822 ± 197.826
2025-09-12 12:54:40,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1120.107, 837.59796, 732.1843, 699.8245, 1240.8009, 769.68915, 802.2128, 827.7279, 603.4355, 1104.402]
2025-09-12 12:54:40,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [426.0, 341.0, 313.0, 286.0, 585.0, 319.0, 346.0, 297.0, 243.0, 406.0]
2025-09-12 12:54:40,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 57 minutes, 36 seconds)
2025-09-12 13:07:15,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:07:15,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:08:42,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 702.80164 ± 207.308
2025-09-12 13:08:42,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [808.26385, 804.7062, 745.0535, 963.76556, 604.12976, 739.8481, 715.46106, 135.92123, 765.66724, 745.19965]
2025-09-12 13:08:42,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [305.0, 333.0, 332.0, 385.0, 216.0, 267.0, 300.0, 150.0, 285.0, 267.0]
2025-09-12 13:08:42,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 42 minutes, 27 seconds)
2025-09-12 13:21:33,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:21:33,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:23:00,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 742.68054 ± 128.255
2025-09-12 13:23:00,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [933.00946, 757.4757, 929.90424, 922.39343, 612.3703, 688.83374, 627.337, 629.6492, 694.6465, 631.18585]
2025-09-12 13:23:00,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [350.0, 272.0, 345.0, 357.0, 245.0, 266.0, 215.0, 283.0, 264.0, 264.0]
2025-09-12 13:23:00,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 30 minutes)
2025-09-12 13:35:53,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:35:53,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:37:51,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 973.69647 ± 216.919
2025-09-12 13:37:51,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [812.084, 742.6758, 1284.0118, 1107.4543, 834.71265, 764.80414, 955.3486, 785.2217, 1064.9868, 1385.6638]
2025-09-12 13:37:51,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 302.0, 516.0, 457.0, 316.0, 302.0, 355.0, 303.0, 422.0, 553.0]
2025-09-12 13:37:51,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 11 minutes, 16 seconds)
2025-09-12 13:50:14,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:50:14,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:51:46,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 754.47217 ± 143.471
2025-09-12 13:51:46,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [701.4059, 1108.9235, 618.01495, 650.47797, 644.9856, 819.7396, 633.40717, 803.8762, 862.8712, 701.0197]
2025-09-12 13:51:46,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [289.0, 428.0, 280.0, 242.0, 286.0, 342.0, 263.0, 284.0, 303.0, 266.0]
2025-09-12 13:51:46,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 54 minutes, 55 seconds)
2025-09-12 14:04:33,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:04:33,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:06:05,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 755.62695 ± 127.956
2025-09-12 14:06:05,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [845.9826, 806.4508, 581.77795, 882.5038, 1003.07135, 643.54865, 639.61847, 715.93835, 805.60016, 631.77765]
2025-09-12 14:06:05,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 314.0, 238.0, 363.0, 373.0, 282.0, 279.0, 282.0, 309.0, 223.0]
2025-09-12 14:06:05,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 37 minutes, 4 seconds)
2025-09-12 14:18:52,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:18:52,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:20:22,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 772.35681 ± 159.670
2025-09-12 14:20:22,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [520.71497, 737.1012, 882.10986, 752.57965, 889.04944, 613.90906, 711.7547, 1003.7048, 606.67706, 1005.9666]
2025-09-12 14:20:22,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [229.0, 231.0, 341.0, 319.0, 349.0, 239.0, 257.0, 364.0, 228.0, 363.0]
2025-09-12 14:20:22,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 25 minutes, 17 seconds)
2025-09-12 14:33:27,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:33:27,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:34:43,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 676.38300 ± 171.523
2025-09-12 14:34:43,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [683.1466, 266.3938, 677.3168, 941.7217, 730.47266, 590.61566, 736.596, 843.06665, 725.58246, 568.9173]
2025-09-12 14:34:43,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [229.0, 121.0, 258.0, 323.0, 238.0, 209.0, 288.0, 303.0, 272.0, 224.0]
2025-09-12 14:34:43,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 11 minutes, 31 seconds)
2025-09-12 14:47:23,660 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:47:23,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:48:51,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 757.70630 ± 256.208
2025-09-12 14:48:51,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [713.137, 719.99744, 672.7307, 285.31094, 666.9835, 898.1245, 1012.18475, 1124.5939, 1054.1143, 429.8859]
2025-09-12 14:48:51,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [290.0, 279.0, 255.0, 141.0, 269.0, 347.0, 354.0, 400.0, 380.0, 194.0]
2025-09-12 14:48:51,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 50 minutes, 4 seconds)
2025-09-12 15:01:47,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:01:47,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:03:40,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 999.66846 ± 221.630
2025-09-12 15:03:40,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [863.36206, 751.67413, 1241.7123, 1161.5222, 940.8042, 902.947, 998.8053, 1497.8748, 821.7958, 816.1866]
2025-09-12 15:03:40,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 280.0, 595.0, 399.0, 336.0, 344.0, 342.0, 502.0, 295.0, 277.0]
2025-09-12 15:03:40,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 44 minutes, 31 seconds)
2025-09-12 15:16:28,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:16:28,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:17:58,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 824.64319 ± 151.174
2025-09-12 15:17:58,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [732.745, 822.7924, 1021.3901, 1072.6772, 809.40155, 629.602, 965.32275, 583.85, 842.9549, 765.6957]
2025-09-12 15:17:58,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [280.0, 299.0, 351.0, 390.0, 274.0, 262.0, 326.0, 205.0, 312.0, 250.0]
2025-09-12 15:17:58,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 30 minutes, 3 seconds)
2025-09-12 15:30:38,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:30:38,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:31:55,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 668.25830 ± 247.712
2025-09-12 15:31:55,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [740.77783, 910.1647, 315.16898, 1000.0959, 601.581, 731.1664, 948.3665, 613.1111, 626.5467, 195.60333]
2025-09-12 15:31:55,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 352.0, 138.0, 347.0, 238.0, 279.0, 318.0, 239.0, 245.0, 107.0]
2025-09-12 15:31:55,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 12 minutes, 39 seconds)
2025-09-12 15:44:37,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:44:37,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:45:59,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 772.58582 ± 187.486
2025-09-12 15:45:59,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [803.57153, 830.37885, 925.4874, 716.43945, 645.7657, 564.5001, 432.807, 738.93195, 1097.437, 970.5389]
2025-09-12 15:45:59,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [303.0, 296.0, 292.0, 258.0, 217.0, 195.0, 167.0, 247.0, 388.0, 346.0]
2025-09-12 15:45:59,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 55 minutes, 39 seconds)
2025-09-12 15:58:46,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:58:46,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:00:01,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 672.82306 ± 348.352
2025-09-12 16:00:01,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [948.7324, 888.72375, 167.47821, 148.36871, 1091.8043, 1037.495, 699.4974, 747.1526, 804.01306, 194.9646]
2025-09-12 16:00:01,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [312.0, 314.0, 88.0, 82.0, 380.0, 338.0, 246.0, 270.0, 274.0, 100.0]
2025-09-12 16:00:01,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 40 minutes, 24 seconds)
2025-09-12 16:12:43,399 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:12:43,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:14:14,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 855.04211 ± 320.463
2025-09-12 16:14:14,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1320.5869, 270.7143, 685.5871, 968.3592, 717.3809, 1282.5653, 666.4355, 603.10803, 1209.493, 826.19037]
2025-09-12 16:14:14,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [462.0, 125.0, 242.0, 324.0, 286.0, 414.0, 220.0, 223.0, 391.0, 272.0]
2025-09-12 16:14:14,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 20 minutes, 57 seconds)
2025-09-12 16:27:07,085 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:27:07,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:28:40,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 878.12659 ± 190.706
2025-09-12 16:28:40,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1174.4609, 528.01196, 1111.7196, 689.56354, 1040.02, 925.86816, 878.0617, 929.7175, 742.8584, 760.98364]
2025-09-12 16:28:40,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [417.0, 186.0, 345.0, 262.0, 360.0, 315.0, 288.0, 295.0, 330.0, 272.0]
2025-09-12 16:28:40,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 7 minutes, 56 seconds)
2025-09-12 16:41:34,128 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:41:34,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:43:14,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 983.10626 ± 242.777
2025-09-12 16:43:14,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [769.8374, 1072.5833, 746.9033, 605.6431, 1065.2009, 846.5806, 1038.1766, 976.3343, 1482.82, 1226.9822]
2025-09-12 16:43:14,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [291.0, 341.0, 263.0, 203.0, 337.0, 286.0, 346.0, 318.0, 477.0, 406.0]
2025-09-12 16:43:14,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 59 minutes)
2025-09-12 16:56:01,222 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:56:01,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:57:34,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 875.36505 ± 271.313
2025-09-12 16:57:34,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [827.9342, 690.4254, 873.4692, 980.70685, 869.9781, 261.45844, 732.49164, 1121.622, 1091.2799, 1304.2847]
2025-09-12 16:57:34,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [279.0, 262.0, 293.0, 321.0, 290.0, 117.0, 295.0, 371.0, 385.0, 454.0]
2025-09-12 16:57:34,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 46 minutes, 58 seconds)
2025-09-12 17:10:14,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:10:14,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:11:32,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 690.69824 ± 295.075
2025-09-12 17:11:32,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [609.9459, 664.632, 202.72713, 957.72064, 671.7316, 163.99171, 777.0698, 941.8977, 782.34576, 1134.92]
2025-09-12 17:11:32,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 248.0, 121.0, 339.0, 255.0, 93.0, 283.0, 325.0, 278.0, 362.0]
2025-09-12 17:11:32,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 32 minutes, 10 seconds)
2025-09-12 17:24:23,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:24:23,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:26:02,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 942.92004 ± 199.467
2025-09-12 17:26:02,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [621.4777, 1114.1444, 1322.0315, 1012.70483, 969.1649, 938.25385, 1070.8405, 905.7311, 654.6064, 820.2452]
2025-09-12 17:26:02,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 372.0, 416.0, 324.0, 346.0, 321.0, 444.0, 298.0, 246.0, 261.0]
2025-09-12 17:26:02,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 20 minutes, 6 seconds)
2025-09-12 17:38:57,655 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:38:57,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:40:17,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 762.81067 ± 293.987
2025-09-12 17:40:17,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [762.95435, 812.1718, 647.3007, 210.17447, 663.19995, 848.8764, 428.5892, 1329.7936, 882.43896, 1042.6073]
2025-09-12 17:40:17,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 287.0, 224.0, 99.0, 227.0, 282.0, 168.0, 407.0, 283.0, 387.0]
2025-09-12 17:40:17,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 4 minutes, 21 seconds)
2025-09-12 17:53:14,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:53:14,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:54:35,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 756.80536 ± 217.143
2025-09-12 17:54:35,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [682.05597, 817.0311, 890.01953, 837.9073, 707.5857, 1053.8219, 837.4986, 230.59727, 576.99054, 934.54535]
2025-09-12 17:54:35,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [241.0, 279.0, 314.0, 270.0, 243.0, 345.0, 283.0, 115.0, 207.0, 318.0]
2025-09-12 17:54:35,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 47 minutes, 58 seconds)
2025-09-12 18:07:11,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:07:11,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:08:51,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 964.28693 ± 162.796
2025-09-12 18:08:51,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [891.48016, 1246.6768, 687.1658, 1042.5569, 1030.9031, 1080.2047, 782.6898, 1050.4608, 784.9966, 1045.7349]
2025-09-12 18:08:51,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [285.0, 437.0, 234.0, 350.0, 343.0, 364.0, 276.0, 356.0, 267.0, 343.0]
2025-09-12 18:08:51,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 33 minutes, 13 seconds)
2025-09-12 18:21:44,269 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:21:44,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:23:28,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1015.11682 ± 125.972
2025-09-12 18:23:28,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1214.6245, 1074.4883, 970.6723, 992.27954, 893.3542, 930.2607, 926.8052, 861.9195, 1027.9156, 1258.8489]
2025-09-12 18:23:28,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [404.0, 327.0, 313.0, 338.0, 321.0, 323.0, 310.0, 303.0, 346.0, 393.0]
2025-09-12 18:23:28,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 23 minutes, 31 seconds)
2025-09-12 18:36:27,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:36:27,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:37:51,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 812.20605 ± 233.794
2025-09-12 18:37:51,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [835.14465, 910.80365, 954.8025, 903.67773, 932.376, 1031.096, 158.49445, 818.51105, 696.92365, 880.23035]
2025-09-12 18:37:51,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 313.0, 345.0, 315.0, 292.0, 325.0, 80.0, 297.0, 234.0, 283.0]
2025-09-12 18:37:51,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 8 minutes, 19 seconds)
2025-09-12 18:50:44,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:50:44,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:52:12,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 821.16199 ± 121.572
2025-09-12 18:52:12,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1003.686, 782.13837, 853.80566, 850.01904, 808.3196, 707.5289, 722.40765, 937.10095, 585.9434, 960.6709]
2025-09-12 18:52:12,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [336.0, 276.0, 285.0, 344.0, 281.0, 257.0, 253.0, 324.0, 201.0, 321.0]
2025-09-12 18:52:12,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 54 minutes, 34 seconds)
2025-09-12 19:04:49,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:04:49,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:06:22,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 908.56085 ± 336.208
2025-09-12 19:06:22,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1144.4385, 705.37476, 799.4074, 208.6949, 675.72064, 1418.6133, 1245.9009, 798.75006, 879.1218, 1209.5857]
2025-09-12 19:06:22,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [370.0, 241.0, 249.0, 104.0, 231.0, 461.0, 404.0, 294.0, 315.0, 394.0]
2025-09-12 19:06:22,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 39 minutes, 26 seconds)
2025-09-12 19:18:59,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:18:59,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:20:32,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 904.80713 ± 266.456
2025-09-12 19:20:32,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [759.94684, 873.936, 1130.7131, 909.5061, 1029.0294, 1055.2423, 314.9854, 830.03516, 764.6704, 1380.006]
2025-09-12 19:20:32,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 300.0, 349.0, 302.0, 338.0, 343.0, 131.0, 282.0, 282.0, 420.0]
2025-09-12 19:20:32,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 24 minutes, 27 seconds)
2025-09-12 19:33:37,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:33:37,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:35:17,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 952.39343 ± 349.631
2025-09-12 19:35:17,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1213.1228, 569.2841, 1116.6649, 924.96027, 1021.82794, 977.4713, 1000.5281, 967.45416, 1558.0879, 174.53328]
2025-09-12 19:35:17,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [386.0, 209.0, 371.0, 313.0, 359.0, 317.0, 377.0, 319.0, 497.0, 91.0]
2025-09-12 19:35:17,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 10 minutes, 55 seconds)
2025-09-12 19:48:07,171 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:48:07,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:50:35,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1554.07983 ± 795.080
2025-09-12 19:50:35,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1836.6736, 3057.7734, 712.6607, 2220.8962, 794.3506, 1962.9397, 2361.534, 828.17535, 839.911, 925.88403]
2025-09-12 19:50:35,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [575.0, 937.0, 260.0, 674.0, 282.0, 578.0, 686.0, 295.0, 312.0, 310.0]
2025-09-12 19:50:35,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1554.08) for latency MM1Queue_a033_s075
2025-09-12 19:50:35,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 1 minute, 53 seconds)
2025-09-12 20:03:08,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:03:08,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:04:51,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1008.01819 ± 230.468
2025-09-12 20:04:51,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1023.96893, 1163.723, 1256.2703, 848.4549, 848.2697, 1096.2865, 968.3778, 1435.171, 854.22205, 585.43805]
2025-09-12 20:04:51,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 404.0, 391.0, 289.0, 296.0, 380.0, 322.0, 428.0, 316.0, 204.0]
2025-09-12 20:04:51,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 46 minutes, 52 seconds)
2025-09-12 20:18:01,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:18:01,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:19:53,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1138.91516 ± 681.755
2025-09-12 20:19:53,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [972.00305, 912.322, 779.483, 844.48035, 984.7302, 1058.3011, 892.8677, 908.5986, 3172.0327, 864.3334]
2025-09-12 20:19:53,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [332.0, 302.0, 275.0, 293.0, 335.0, 342.0, 301.0, 310.0, 899.0, 292.0]
2025-09-12 20:19:53,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 36 minutes, 59 seconds)
2025-09-12 20:32:34,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:32:34,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:34:21,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1007.09406 ± 311.095
2025-09-12 20:34:21,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1524.759, 776.25903, 1096.8873, 1246.6022, 1250.6436, 1026.5865, 986.4186, 1086.0731, 732.42035, 344.2913]
2025-09-12 20:34:21,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [594.0, 272.0, 369.0, 395.0, 403.0, 348.0, 351.0, 365.0, 257.0, 130.0]
2025-09-12 20:34:22,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 23 minutes, 54 seconds)
2025-09-12 20:47:01,078 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:47:01,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:48:48,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1059.25085 ± 227.379
2025-09-12 20:48:48,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [926.76715, 1234.2349, 1023.19305, 654.47784, 1210.2902, 1292.169, 1143.1742, 822.0113, 869.91974, 1416.2717]
2025-09-12 20:48:48,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [322.0, 386.0, 344.0, 224.0, 408.0, 426.0, 376.0, 295.0, 314.0, 442.0]
2025-09-12 20:48:48,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 7 minutes, 37 seconds)
2025-09-12 21:01:41,610 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:01:41,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:03:53,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1318.03113 ± 708.754
2025-09-12 21:03:53,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1037.2681, 664.6148, 830.2936, 3213.2356, 1218.0049, 1349.2844, 1819.87, 778.0265, 1316.9547, 952.7584]
2025-09-12 21:03:53,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [355.0, 256.0, 288.0, 1000.0, 391.0, 437.0, 567.0, 284.0, 432.0, 308.0]
2025-09-12 21:03:53,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 51 minutes, 46 seconds)
2025-09-12 21:16:42,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:16:42,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:19:55,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1921.24353 ± 955.245
2025-09-12 21:19:55,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3034.8662, 1028.9968, 3224.4468, 986.5, 1739.8579, 1474.1472, 2729.4219, 1168.6967, 674.978, 3150.5237]
2025-09-12 21:19:55,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 340.0, 1000.0, 325.0, 561.0, 492.0, 879.0, 389.0, 230.0, 1000.0]
2025-09-12 21:19:55,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1921.24) for latency MM1Queue_a033_s075
2025-09-12 21:19:55,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 45 minutes, 19 seconds)
2025-09-12 21:33:15,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:33:15,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:35:33,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1395.17773 ± 639.522
2025-09-12 21:35:33,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [158.1931, 1099.8483, 1243.9843, 1138.9999, 1206.919, 2011.1091, 802.96704, 2395.3328, 1881.1725, 2013.2507]
2025-09-12 21:35:33,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [94.0, 392.0, 401.0, 385.0, 391.0, 615.0, 280.0, 732.0, 561.0, 641.0]
2025-09-12 21:35:33,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 32 minutes, 53 seconds)
2025-09-12 21:48:20,644 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:48:20,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:50:55,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1615.59839 ± 791.573
2025-09-12 21:50:55,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1844.4401, 251.49545, 823.7311, 2122.728, 1304.7108, 2798.3022, 1207.3893, 2212.8235, 2627.421, 962.9428]
2025-09-12 21:50:55,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [602.0, 119.0, 299.0, 598.0, 470.0, 822.0, 412.0, 685.0, 768.0, 327.0]
2025-09-12 21:50:55,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 21 minutes, 34 seconds)
2025-09-12 22:03:13,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:03:13,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:05:58,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1613.26526 ± 695.434
2025-09-12 22:05:58,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1001.65125, 705.48303, 952.8596, 1250.2373, 2185.7607, 2214.911, 3069.604, 1994.2194, 1330.7911, 1427.1349]
2025-09-12 22:05:58,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [352.0, 248.0, 335.0, 438.0, 651.0, 752.0, 930.0, 703.0, 435.0, 460.0]
2025-09-12 22:05:58,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 8 minutes, 36 seconds)
2025-09-12 22:18:51,372 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:18:51,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:20:54,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1256.50952 ± 842.076
2025-09-12 22:20:54,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3337.1116, 2239.871, 930.1691, 855.8963, 173.65053, 1064.8713, 952.701, 1108.4199, 1046.2534, 856.152]
2025-09-12 22:20:54,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 673.0, 327.0, 286.0, 86.0, 341.0, 329.0, 363.0, 349.0, 312.0]
2025-09-12 22:20:54,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 52 minutes, 42 seconds)
2025-09-12 22:33:45,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:33:45,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:36:22,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1642.17212 ± 1124.852
2025-09-12 22:36:22,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [253.62794, 369.04254, 1027.902, 2845.685, 2548.7937, 3289.4727, 2830.0947, 2004.392, 887.6288, 365.08]
2025-09-12 22:36:22,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 151.0, 347.0, 854.0, 751.0, 1000.0, 868.0, 604.0, 292.0, 150.0]
2025-09-12 22:36:22,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 35 minutes, 13 seconds)
2025-09-12 22:50:21,761 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:50:21,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:53:55,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2262.70117 ± 784.589
2025-09-12 22:53:55,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3207.2776, 3084.6333, 1678.5192, 1201.3785, 2759.2256, 2817.1204, 1746.8284, 923.0528, 2973.266, 2235.71]
2025-09-12 22:53:55,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [935.0, 898.0, 529.0, 372.0, 839.0, 802.0, 544.0, 323.0, 1000.0, 671.0]
2025-09-12 22:53:55,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (2262.70) for latency MM1Queue_a033_s075
2025-09-12 22:53:55,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 26 minutes, 26 seconds)
2025-09-12 23:05:53,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:05:53,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:08:19,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1487.82690 ± 793.461
2025-09-12 23:08:19,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [211.54846, 2168.3752, 1224.726, 918.7385, 2202.3813, 1537.4183, 833.6408, 1147.1691, 1487.0366, 3147.2332]
2025-09-12 23:08:19,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 662.0, 414.0, 317.0, 686.0, 477.0, 293.0, 370.0, 502.0, 962.0]
2025-09-12 23:08:19,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 7 minutes, 40 seconds)
2025-09-12 23:21:14,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:21:14,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:24:09,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1874.46387 ± 821.758
2025-09-12 23:24:09,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2307.3586, 1156.3887, 1650.7421, 2470.367, 2794.7012, 3545.356, 987.1057, 1134.4192, 1105.3071, 1592.893]
2025-09-12 23:24:09,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [685.0, 377.0, 500.0, 729.0, 796.0, 993.0, 332.0, 367.0, 375.0, 486.0]
2025-09-12 23:24:09,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 54 minutes, 33 seconds)
2025-09-12 23:36:58,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:36:58,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:40:01,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1922.60486 ± 904.409
2025-09-12 23:40:01,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1542.2463, 862.94293, 1053.5035, 3364.9663, 2355.774, 1075.1863, 3290.1511, 1612.886, 2802.143, 1266.2496]
2025-09-12 23:40:01,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [495.0, 300.0, 370.0, 1000.0, 683.0, 360.0, 1000.0, 518.0, 882.0, 427.0]
2025-09-12 23:40:01,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 41 minutes, 31 seconds)
2025-09-12 23:52:52,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:52:52,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:56:30,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2293.86816 ± 846.641
2025-09-12 23:56:30,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2007.4747, 1711.8501, 3287.1614, 1663.9297, 904.9254, 3247.696, 3298.1963, 1618.2124, 1939.0538, 3260.1821]
2025-09-12 23:56:30,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [631.0, 524.0, 1000.0, 516.0, 303.0, 1000.0, 1000.0, 506.0, 592.0, 1000.0]
2025-09-12 23:56:30,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (2293.87) for latency MM1Queue_a033_s075
2025-09-12 23:56:30,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 28 minutes, 19 seconds)
2025-09-13 00:09:13,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:09:13,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:12:13,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1928.26978 ± 1014.506
2025-09-13 00:12:13,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [941.1751, 3489.1365, 1260.6581, 2130.5295, 779.26636, 3318.887, 1414.2323, 958.0709, 1691.5685, 3299.1746]
2025-09-13 00:12:13,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [303.0, 1000.0, 404.0, 642.0, 265.0, 1000.0, 459.0, 315.0, 534.0, 1000.0]
2025-09-13 00:12:13,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 7 minutes, 57 seconds)
2025-09-13 00:25:12,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:25:12,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:28:43,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2232.83789 ± 1033.874
2025-09-13 00:28:43,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3267.6128, 1573.5392, 3150.1921, 544.24835, 3217.6172, 1851.864, 891.28455, 1431.1921, 3374.9326, 3025.8972]
2025-09-13 00:28:43,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 462.0, 943.0, 202.0, 1000.0, 570.0, 318.0, 426.0, 1000.0, 906.0]
2025-09-13 00:28:43,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 56 minutes, 51 seconds)
2025-09-13 00:41:21,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:41:21,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:45:09,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2349.52075 ± 1076.336
2025-09-13 00:45:09,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2933.6606, 3282.9397, 3102.3591, 3228.6921, 3307.5537, 1225.4272, 3240.0544, 1188.6686, 275.26968, 1710.5814]
2025-09-13 00:45:09,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [919.0, 1000.0, 1000.0, 1000.0, 1000.0, 414.0, 994.0, 427.0, 124.0, 560.0]
2025-09-13 00:45:09,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (2349.52) for latency MM1Queue_a033_s075
2025-09-13 00:45:09,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 42 minutes, 1 second)
2025-09-13 00:58:33,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:58:33,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:01:10,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1630.56128 ± 990.372
2025-09-13 01:01:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2085.4773, 3357.3389, 926.0178, 144.08034, 1097.9565, 1678.2754, 3367.3088, 1232.6915, 1427.7748, 988.69226]
2025-09-13 01:01:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [662.0, 1000.0, 308.0, 77.0, 406.0, 510.0, 1000.0, 380.0, 457.0, 333.0]
2025-09-13 01:01:10,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 26 minutes, 4 seconds)
2025-09-13 01:14:11,801 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:14:11,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:17:19,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1960.31812 ± 1044.242
2025-09-13 01:17:19,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [979.9754, 2130.609, 3350.6287, 2238.4714, 3410.4302, 3414.5742, 1329.5905, 1023.6159, 836.98254, 888.3045]
2025-09-13 01:17:19,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 669.0, 1000.0, 668.0, 990.0, 1000.0, 427.0, 347.0, 292.0, 314.0]
2025-09-13 01:17:19,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 9 minutes, 18 seconds)
2025-09-13 01:30:27,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:30:27,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:35:07,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2890.60596 ± 657.282
2025-09-13 01:35:07,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3242.9783, 3298.6418, 2560.4617, 3219.1897, 3166.4126, 3195.2183, 3217.8706, 2829.9385, 1028.538, 3146.8096]
2025-09-13 01:35:07,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 800.0, 1000.0, 1000.0, 1000.0, 1000.0, 867.0, 349.0, 1000.0]
2025-09-13 01:35:07,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (2890.61) for latency MM1Queue_a033_s075
2025-09-13 01:35:07,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 56 minutes, 2 seconds)
2025-09-13 01:47:47,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:47:47,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:51:42,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2489.22681 ± 1072.820
2025-09-13 01:51:42,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3261.299, 2206.0002, 962.94147, 3356.3723, 2806.3252, 2732.9905, 51.706, 3249.3901, 2783.064, 3482.177]
2025-09-13 01:51:42,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 688.0, 329.0, 1000.0, 883.0, 851.0, 111.0, 1000.0, 821.0, 1000.0]
2025-09-13 01:51:42,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 39 minutes, 34 seconds)
2025-09-13 02:04:26,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:04:26,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:09:08,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 3036.03564 ± 631.823
2025-09-13 02:09:08,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1360.3154, 3443.0125, 3217.4885, 2377.0337, 3429.8013, 3364.4868, 3234.7024, 3356.971, 3384.3828, 3192.1616]
2025-09-13 02:09:08,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [428.0, 1000.0, 1000.0, 712.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:09:08,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (3036.04) for latency MM1Queue_a033_s075
2025-09-13 02:09:08,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 23 minutes, 58 seconds)
2025-09-13 02:21:45,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:21:45,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:25:32,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2470.33960 ± 1152.030
2025-09-13 02:25:32,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3074.0227, 3288.105, 1171.774, 3303.6692, 2776.1992, 3287.4739, 3334.8445, 210.2584, 3354.8147, 902.2343]
2025-09-13 02:25:32,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [909.0, 1000.0, 374.0, 1000.0, 827.0, 937.0, 1000.0, 97.0, 1000.0, 297.0]
2025-09-13 02:25:32,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 7 minutes, 29 seconds)
2025-09-13 02:38:02,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:38:02,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:42:11,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2751.73584 ± 853.552
2025-09-13 02:42:11,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3001.1821, 885.5623, 1741.3307, 2305.636, 3435.449, 3447.7146, 2376.4214, 3415.6096, 3441.3672, 3467.0842]
2025-09-13 02:42:11,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [862.0, 303.0, 538.0, 699.0, 1000.0, 1000.0, 734.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:42:11,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 50 minutes, 54 seconds)
2025-09-13 02:55:04,011 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:55:04,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:59:38,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2881.45239 ± 740.493
2025-09-13 02:59:38,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3283.1025, 3215.635, 2030.8114, 3241.312, 3296.362, 3132.2942, 940.5931, 3242.5728, 3238.8806, 3192.9597]
2025-09-13 02:59:38,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 583.0, 1000.0, 1000.0, 1000.0, 323.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:59:38,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 48 seconds)
2025-09-13 03:13:18,946 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:13:18,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:17:26,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2630.43604 ± 840.847
2025-09-13 03:17:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2252.1968, 2740.5293, 2686.2224, 3334.924, 3282.21, 1950.5204, 2989.9507, 500.55304, 3214.4253, 3352.8286]
2025-09-13 03:17:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [687.0, 854.0, 826.0, 1000.0, 1000.0, 581.0, 881.0, 182.0, 1000.0, 1000.0]
2025-09-13 03:17:26,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 8 seconds)
2025-09-13 03:29:23,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:29:23,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:34:03,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 3007.46729 ± 596.437
2025-09-13 03:34:03,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3108.0386, 3343.1338, 3347.8662, 3360.348, 3509.1577, 2519.681, 1446.8281, 2774.6155, 3290.9722, 3374.033]
2025-09-13 03:34:03,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [937.0, 1000.0, 1000.0, 1000.0, 1000.0, 749.0, 475.0, 847.0, 1000.0, 1000.0]
2025-09-13 03:34:03,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1251 [DEBUG]: Training session finished
