2025-09-12 03:20:42,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:20:42,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:20:42,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x154ceb6edf10>}
2025-09-12 03:20:42,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1111 [DEBUG]: using device: cuda
2025-09-12 03:20:42,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1133 [INFO]: Creating new trainer
2025-09-12 03:20:42,407 baseline-mbpac-noiseperc25-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 03:20:42,407 baseline-mbpac-noiseperc25-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 03:20:42,415 baseline-mbpac-noiseperc25-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 03:20:43,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1194 [DEBUG]: Starting training session...
2025-09-12 03:20:43,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 1/100
2025-09-12 03:31:11,549 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:31:11,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:31:50,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 33.41266 ± 101.001
2025-09-12 03:31:50,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.0595613, -21.942913, -61.145164, 222.81094, -16.949478, 19.949116, -29.033346, 6.452328, -26.855703, 238.78128]
2025-09-12 03:31:50,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 128.0, 112.0, 173.0, 100.0, 168.0, 95.0, 105.0, 141.0, 139.0]
2025-09-12 03:31:50,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (33.41) for latency MM1Queue_a033_s075
2025-09-12 03:31:50,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 21 minutes, 11 seconds)
2025-09-12 03:43:54,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:43:54,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:44:17,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: -0.45918 ± 20.150
2025-09-12 03:44:17,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.2359835, 20.319166, 3.9724247, 7.47867, 10.937685, -27.037207, -23.365065, 10.657392, -35.14434, 28.825457]
2025-09-12 03:44:17,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [38.0, 84.0, 43.0, 21.0, 73.0, 95.0, 76.0, 138.0, 123.0, 93.0]
2025-09-12 03:44:17,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 14 minutes, 25 seconds)
2025-09-12 03:56:15,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:56:15,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:56:39,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 18.73069 ± 18.165
2025-09-12 03:56:39,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [25.814064, 30.845001, 11.706045, -5.56776, -13.314065, 14.251918, 54.352573, 19.250137, 19.127516, 30.841465]
2025-09-12 03:56:39,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [66.0, 69.0, 73.0, 108.0, 107.0, 45.0, 179.0, 75.0, 47.0, 61.0]
2025-09-12 03:56:39,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 21 minutes, 54 seconds)
2025-09-12 04:08:44,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:08:44,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:09:12,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 44.93832 ± 53.504
2025-09-12 04:09:12,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [192.59589, 23.584904, 37.891945, 38.221027, 79.07334, 21.183477, 37.553207, 3.8985267, 11.582104, 3.7988114]
2025-09-12 04:09:12,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 148.0, 123.0, 43.0, 123.0, 53.0, 86.0, 14.0, 61.0, 20.0]
2025-09-12 04:09:12,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (44.94) for latency MM1Queue_a033_s075
2025-09-12 04:09:12,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 23 minutes, 47 seconds)
2025-09-12 04:21:25,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:21:25,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:22:03,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 124.26906 ± 128.834
2025-09-12 04:22:03,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [71.7126, 24.133785, 62.001278, 161.54024, 45.54949, 41.4756, 426.14066, 18.600338, 301.50208, 90.03437]
2025-09-12 04:22:03,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 39.0, 151.0, 180.0, 52.0, 138.0, 265.0, 33.0, 189.0, 140.0]
2025-09-12 04:22:03,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (124.27) for latency MM1Queue_a033_s075
2025-09-12 04:22:03,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 25 minutes, 27 seconds)
2025-09-12 04:34:14,885 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:34:14,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:34:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 145.59215 ± 124.081
2025-09-12 04:34:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [92.795204, 323.7698, 98.81741, 3.354656, 142.04005, 269.51456, 21.061718, 14.926826, 365.99078, 123.65054]
2025-09-12 04:34:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 197.0, 90.0, 13.0, 212.0, 134.0, 49.0, 27.0, 211.0, 124.0]
2025-09-12 04:34:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (145.59) for latency MM1Queue_a033_s075
2025-09-12 04:34:49,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 43 minutes, 55 seconds)
2025-09-12 04:46:54,988 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:46:54,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:47:30,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 148.14606 ± 156.975
2025-09-12 04:47:30,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [139.81595, 234.95886, 25.683151, 50.43506, 364.41733, 29.191101, 42.41339, 40.921417, 497.66254, 55.96185]
2025-09-12 04:47:30,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [94.0, 143.0, 59.0, 99.0, 255.0, 50.0, 120.0, 113.0, 218.0, 71.0]
2025-09-12 04:47:30,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (148.15) for latency MM1Queue_a033_s075
2025-09-12 04:47:30,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 36 minutes)
2025-09-12 04:59:38,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:59:38,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:00:30,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 231.48865 ± 163.989
2025-09-12 05:00:30,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [217.50035, 23.763676, 161.4078, 89.78672, 423.71393, 217.27687, 150.67458, 93.34803, 581.1291, 356.28534]
2025-09-12 05:00:30,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 39.0, 204.0, 133.0, 231.0, 128.0, 157.0, 169.0, 403.0, 223.0]
2025-09-12 05:00:30,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (231.49) for latency MM1Queue_a033_s075
2025-09-12 05:00:30,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 34 minutes, 55 seconds)
2025-09-12 05:12:32,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:12:32,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:13:16,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 192.97232 ± 186.465
2025-09-12 05:13:16,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [56.610836, 505.33792, 303.2301, 127.74005, 32.937824, 16.417217, 44.95623, 312.70935, 501.26138, 28.522308]
2025-09-12 05:13:16,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [66.0, 360.0, 199.0, 137.0, 46.0, 24.0, 61.0, 186.0, 361.0, 76.0]
2025-09-12 05:13:16,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 25 minutes, 51 seconds)
2025-09-12 05:25:30,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:25:30,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:26:13,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 207.32645 ± 147.455
2025-09-12 05:26:13,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [27.951569, 6.3463664, 262.73798, 5.5909505, 332.47537, 380.6563, 197.50006, 412.93597, 302.56528, 144.50443]
2025-09-12 05:26:13,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [53.0, 16.0, 153.0, 14.0, 231.0, 228.0, 255.0, 209.0, 179.0, 174.0]
2025-09-12 05:26:13,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 14 minutes, 55 seconds)
2025-09-12 05:38:24,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:38:24,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:38:55,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 160.87752 ± 163.094
2025-09-12 05:38:55,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [41.02971, 8.068429, 4.9237895, 133.00743, 490.8839, 15.409554, 34.054825, 250.43005, 340.07953, 290.88794]
2025-09-12 05:38:55,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 22.0, 14.0, 163.0, 232.0, 30.0, 44.0, 161.0, 164.0, 183.0]
2025-09-12 05:38:55,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 1 minute, 2 seconds)
2025-09-12 05:51:02,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:51:02,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:51:42,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 253.90031 ± 186.090
2025-09-12 05:51:42,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [425.0188, 527.27527, 493.47174, 12.336046, 296.30386, 40.558792, 119.72314, 360.47906, 31.679453, 232.1569]
2025-09-12 05:51:42,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [223.0, 295.0, 191.0, 23.0, 138.0, 46.0, 127.0, 167.0, 50.0, 134.0]
2025-09-12 05:51:42,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (253.90) for latency MM1Queue_a033_s075
2025-09-12 05:51:42,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 49 minutes, 56 seconds)
2025-09-12 06:03:50,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:03:50,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:04:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 301.92303 ± 165.328
2025-09-12 06:04:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [264.2619, 327.6837, 414.30405, 260.56122, 444.46182, 61.60407, 620.96014, 24.487312, 306.7795, 294.12668]
2025-09-12 06:04:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 213.0, 235.0, 178.0, 342.0, 82.0, 262.0, 71.0, 164.0, 150.0]
2025-09-12 06:04:42,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (301.92) for latency MM1Queue_a033_s075
2025-09-12 06:04:42,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 37 minutes, 7 seconds)
2025-09-12 06:17:00,493 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:17:00,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:17:51,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 300.72485 ± 114.065
2025-09-12 06:17:51,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [375.44788, 58.341072, 209.74124, 382.29346, 286.52957, 415.1945, 177.94041, 304.59772, 357.98865, 439.17374]
2025-09-12 06:17:51,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [198.0, 71.0, 158.0, 187.0, 196.0, 305.0, 107.0, 144.0, 178.0, 199.0]
2025-09-12 06:17:51,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 30 minutes, 46 seconds)
2025-09-12 06:30:06,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:30:06,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:30:35,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 138.78284 ± 143.957
2025-09-12 06:30:35,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [47.809734, 293.26178, 34.69294, 58.289436, 3.4768386, 299.18106, 446.48004, 119.36054, 52.76162, 32.51439]
2025-09-12 06:30:35,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [62.0, 209.0, 45.0, 58.0, 18.0, 157.0, 220.0, 131.0, 64.0, 37.0]
2025-09-12 06:30:35,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 14 minutes, 18 seconds)
2025-09-12 06:42:32,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:42:32,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:43:13,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 269.32727 ± 167.836
2025-09-12 06:43:13,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [306.6586, 35.525154, 388.6603, 28.396109, 356.88684, 28.63881, 325.52536, 542.20026, 336.56934, 344.21198]
2025-09-12 06:43:13,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 51.0, 179.0, 48.0, 252.0, 53.0, 164.0, 196.0, 166.0, 153.0]
2025-09-12 06:43:13,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 10 seconds)
2025-09-12 06:55:29,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:55:29,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:56:23,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 350.50766 ± 94.466
2025-09-12 06:56:23,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [470.83127, 265.06033, 305.00595, 433.1099, 472.90845, 227.11707, 326.6197, 200.54045, 403.79953, 400.084]
2025-09-12 06:56:23,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 163.0, 164.0, 220.0, 300.0, 143.0, 151.0, 113.0, 183.0, 201.0]
2025-09-12 06:56:23,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (350.51) for latency MM1Queue_a033_s075
2025-09-12 06:56:23,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 53 minutes, 45 seconds)
2025-09-12 07:08:30,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:08:30,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:09:15,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 312.80786 ± 150.208
2025-09-12 07:09:15,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [334.85022, 258.1578, 406.47986, 73.5352, 370.20575, 550.79065, 31.965364, 308.8138, 356.31094, 436.96863]
2025-09-12 07:09:15,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 137.0, 257.0, 75.0, 195.0, 197.0, 47.0, 151.0, 179.0, 185.0]
2025-09-12 07:09:15,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 38 minutes, 34 seconds)
2025-09-12 07:21:25,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:21:25,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:22:01,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 237.22324 ± 171.620
2025-09-12 07:22:01,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [314.84772, 322.1411, 7.5363946, 35.3797, 83.08952, 420.09647, 411.0581, 130.874, 516.1522, 131.0571]
2025-09-12 07:22:01,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 137.0, 18.0, 46.0, 68.0, 173.0, 214.0, 102.0, 213.0, 104.0]
2025-09-12 07:22:01,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 19 minutes, 35 seconds)
2025-09-12 07:34:05,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:34:05,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:34:38,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 218.21123 ± 125.463
2025-09-12 07:34:38,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [41.93722, 352.46823, 358.18997, 47.567574, 19.038044, 283.10455, 226.82663, 245.9797, 316.2166, 290.7837]
2025-09-12 07:34:38,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 151.0, 144.0, 57.0, 32.0, 164.0, 125.0, 118.0, 135.0, 130.0]
2025-09-12 07:34:38,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 4 minutes, 37 seconds)
2025-09-12 07:46:54,320 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:46:54,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:47:38,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 311.03879 ± 92.159
2025-09-12 07:47:38,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [239.92023, 269.4439, 369.16873, 353.8332, 389.78864, 306.3581, 93.99954, 285.13107, 365.22372, 437.52075]
2025-09-12 07:47:38,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 144.0, 164.0, 172.0, 142.0, 140.0, 124.0, 139.0, 161.0, 206.0]
2025-09-12 07:47:38,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 57 minutes, 44 seconds)
2025-09-12 07:59:47,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:47,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:00:23,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 237.98514 ± 158.142
2025-09-12 08:00:23,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [247.51741, 253.47601, 334.0117, 245.5476, 35.94304, 86.66275, 5.1059794, 551.79333, 392.69467, 227.09882]
2025-09-12 08:00:23,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 140.0, 147.0, 132.0, 43.0, 75.0, 15.0, 274.0, 181.0, 134.0]
2025-09-12 08:00:23,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 38 minutes, 25 seconds)
2025-09-12 08:12:36,098 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:12:36,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:13:05,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 187.01859 ± 132.568
2025-09-12 08:13:05,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [200.87823, 160.39238, 362.88773, 6.428226, 54.34023, 7.148347, 222.40927, 232.91164, 427.67993, 195.10986]
2025-09-12 08:13:05,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [94.0, 101.0, 173.0, 15.0, 67.0, 17.0, 145.0, 110.0, 178.0, 117.0]
2025-09-12 08:13:05,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 23 minutes, 1 second)
2025-09-12 08:25:17,771 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:25:17,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:26:04,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 330.13727 ± 132.871
2025-09-12 08:26:04,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [490.34274, 307.75894, 319.4446, 227.2472, 397.20218, 3.1672447, 482.9773, 359.6111, 320.61642, 393.00485]
2025-09-12 08:26:04,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 148.0, 160.0, 115.0, 182.0, 14.0, 223.0, 149.0, 178.0, 175.0]
2025-09-12 08:26:04,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 13 minutes, 27 seconds)
2025-09-12 08:38:12,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:38:12,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:38:57,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 308.97806 ± 126.801
2025-09-12 08:38:57,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [333.8127, 295.0814, 294.12906, 281.08673, 1.2507471, 231.53673, 331.2815, 440.45312, 386.42163, 494.7271]
2025-09-12 08:38:57,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [157.0, 132.0, 133.0, 141.0, 13.0, 120.0, 170.0, 228.0, 215.0, 256.0]
2025-09-12 08:38:57,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 4 minutes, 44 seconds)
2025-09-12 08:51:14,166 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:51:14,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:51:57,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 296.53888 ± 138.430
2025-09-12 08:51:57,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [545.99506, 343.74417, 447.02344, 354.39468, 203.43907, 309.1859, 283.62396, 5.063565, 238.34207, 234.57701]
2025-09-12 08:51:57,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [271.0, 186.0, 155.0, 175.0, 110.0, 157.0, 164.0, 14.0, 132.0, 120.0]
2025-09-12 08:51:57,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 51 minutes, 56 seconds)
2025-09-12 09:04:01,182 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:04:01,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:04:36,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 254.55037 ± 132.910
2025-09-12 09:04:36,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [285.62747, 254.40895, 2.893815, 17.366629, 259.24753, 322.47324, 259.13007, 376.97733, 423.06143, 344.31705]
2025-09-12 09:04:36,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 128.0, 12.0, 29.0, 129.0, 163.0, 114.0, 175.0, 195.0, 149.0]
2025-09-12 09:04:36,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 37 minutes, 32 seconds)
2025-09-12 09:16:53,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:16:53,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:17:49,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 391.40820 ± 201.279
2025-09-12 09:17:49,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [392.77643, 664.6374, 28.874117, 273.37482, 307.32236, 389.5741, 254.10924, 788.7902, 387.66727, 426.95618]
2025-09-12 09:17:49,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 267.0, 39.0, 127.0, 133.0, 178.0, 121.0, 472.0, 179.0, 217.0]
2025-09-12 09:17:49,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (391.41) for latency MM1Queue_a033_s075
2025-09-12 09:17:49,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 32 minutes, 13 seconds)
2025-09-12 09:29:57,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:29:57,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:30:43,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 355.07535 ± 102.421
2025-09-12 09:30:43,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [304.4939, 268.8343, 386.68808, 525.6445, 344.8522, 444.56335, 208.31122, 406.02316, 205.42517, 455.91754]
2025-09-12 09:30:43,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 129.0, 194.0, 203.0, 152.0, 163.0, 105.0, 161.0, 127.0, 174.0]
2025-09-12 09:30:43,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 18 minutes, 4 seconds)
2025-09-12 09:43:03,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:43:03,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:44:07,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 462.49390 ± 186.393
2025-09-12 09:44:07,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [343.7157, 429.9956, 990.459, 379.16757, 304.06393, 375.40604, 550.03235, 425.9442, 413.1386, 413.01605]
2025-09-12 09:44:07,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 198.0, 618.0, 152.0, 142.0, 161.0, 224.0, 205.0, 177.0, 177.0]
2025-09-12 09:44:07,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (462.49) for latency MM1Queue_a033_s075
2025-09-12 09:44:07,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 12 minutes, 29 seconds)
2025-09-12 09:56:08,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:56:08,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:57:06,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 374.30341 ± 188.709
2025-09-12 09:57:06,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [330.59894, 498.28024, 708.83203, 186.59227, 551.63794, 433.94363, 322.86905, 231.748, 18.764616, 459.76752]
2025-09-12 09:57:06,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 205.0, 340.0, 120.0, 304.0, 251.0, 184.0, 127.0, 27.0, 267.0]
2025-09-12 09:57:06,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 58 minutes, 59 seconds)
2025-09-12 10:09:23,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:09:23,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:10:01,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 280.50180 ± 205.910
2025-09-12 10:10:01,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [178.2773, 318.4306, 318.93546, 4.4346423, 755.5393, 278.74207, 5.4054976, 213.16225, 282.683, 449.40775]
2025-09-12 10:10:01,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [96.0, 131.0, 132.0, 14.0, 360.0, 134.0, 13.0, 140.0, 143.0, 166.0]
2025-09-12 10:10:01,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 49 minutes, 42 seconds)
2025-09-12 10:22:11,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:22:11,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:22:56,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 332.16995 ± 85.282
2025-09-12 10:22:56,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [276.7315, 452.06296, 305.36444, 514.4754, 286.61304, 261.7724, 385.22223, 275.46893, 241.78827, 322.20016]
2025-09-12 10:22:56,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 190.0, 139.0, 236.0, 153.0, 119.0, 147.0, 133.0, 133.0, 158.0]
2025-09-12 10:22:56,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 32 minutes, 23 seconds)
2025-09-12 10:35:05,004 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:35:05,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:35:45,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 286.74219 ± 128.032
2025-09-12 10:35:45,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [322.24783, 270.5052, 11.6334505, 306.0278, 365.33798, 298.1884, 437.58524, 374.39563, 391.35687, 90.1436]
2025-09-12 10:35:45,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 129.0, 19.0, 149.0, 184.0, 133.0, 195.0, 159.0, 152.0, 130.0]
2025-09-12 10:35:45,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 18 minutes, 34 seconds)
2025-09-12 10:48:03,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:48:03,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:48:49,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 379.25620 ± 94.408
2025-09-12 10:48:49,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [327.37415, 422.60675, 490.49512, 333.13007, 250.27988, 410.85745, 279.13998, 362.98264, 335.82037, 579.87555]
2025-09-12 10:48:49,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [141.0, 185.0, 200.0, 131.0, 148.0, 158.0, 130.0, 151.0, 131.0, 216.0]
2025-09-12 10:48:49,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 58 seconds)
2025-09-12 11:00:56,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:00:56,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:01:44,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 369.66284 ± 69.056
2025-09-12 11:01:44,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [333.94162, 264.8123, 399.93765, 289.4886, 435.86072, 469.4625, 349.3241, 390.88007, 302.1248, 460.79605]
2025-09-12 11:01:44,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 109.0, 219.0, 140.0, 171.0, 218.0, 207.0, 151.0, 147.0, 173.0]
2025-09-12 11:01:45,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 47 minutes, 30 seconds)
2025-09-12 11:14:07,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:14:07,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:14:54,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 364.94427 ± 174.520
2025-09-12 11:14:54,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [681.142, 309.88126, 341.79697, 283.11847, 594.57855, 316.824, 396.9534, 269.99295, 441.46915, 13.685685]
2025-09-12 11:14:54,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 203.0, 172.0, 125.0, 237.0, 148.0, 174.0, 120.0, 165.0, 23.0]
2025-09-12 11:14:54,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 37 minutes, 29 seconds)
2025-09-12 11:27:00,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:27:00,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:27:49,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 449.33813 ± 177.932
2025-09-12 11:27:49,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [515.12115, 55.332848, 562.9245, 494.38007, 256.4085, 508.47583, 464.5804, 330.28622, 607.81964, 698.0524]
2025-09-12 11:27:49,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 57.0, 193.0, 190.0, 132.0, 181.0, 174.0, 137.0, 217.0, 230.0]
2025-09-12 11:27:49,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 24 minutes, 41 seconds)
2025-09-12 11:40:03,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:40:03,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:40:51,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 390.99026 ± 105.026
2025-09-12 11:40:51,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [196.22156, 231.70618, 417.73718, 366.55466, 383.5912, 445.15918, 363.71414, 539.6955, 521.46545, 444.0574]
2025-09-12 11:40:51,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 123.0, 159.0, 160.0, 158.0, 211.0, 187.0, 204.0, 209.0, 164.0]
2025-09-12 11:40:51,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 14 minutes, 14 seconds)
2025-09-12 11:53:02,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:53:02,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:53:49,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 400.84589 ± 114.730
2025-09-12 11:53:49,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [298.3377, 444.44073, 304.56213, 523.0653, 426.02805, 566.46857, 364.4685, 338.82834, 197.90686, 544.3527]
2025-09-12 11:53:49,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 180.0, 150.0, 218.0, 167.0, 198.0, 149.0, 159.0, 110.0, 182.0]
2025-09-12 11:53:49,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 7 seconds)
2025-09-12 12:06:08,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:06:08,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:07:03,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 494.62451 ± 102.125
2025-09-12 12:07:03,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [552.92944, 329.36383, 462.01385, 324.78192, 554.9153, 534.0063, 604.74066, 644.72833, 433.61682, 505.14862]
2025-09-12 12:07:03,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 151.0, 186.0, 135.0, 212.0, 196.0, 203.0, 241.0, 160.0, 207.0]
2025-09-12 12:07:03,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (494.62) for latency MM1Queue_a033_s075
2025-09-12 12:07:03,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 50 minutes, 36 seconds)
2025-09-12 12:19:07,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:19:07,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:20:06,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 502.19952 ± 181.740
2025-09-12 12:20:06,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [434.48874, 595.66394, 447.68994, 826.85455, 616.1927, 106.54496, 372.46216, 645.1894, 461.91922, 514.98926]
2025-09-12 12:20:06,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [237.0, 242.0, 179.0, 356.0, 191.0, 81.0, 148.0, 231.0, 178.0, 178.0]
2025-09-12 12:20:06,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (502.20) for latency MM1Queue_a033_s075
2025-09-12 12:20:06,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 36 minutes, 17 seconds)
2025-09-12 12:32:17,107 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:32:17,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:33:06,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 421.87567 ± 189.164
2025-09-12 12:33:06,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [282.80197, 436.05017, 402.4571, 14.393415, 637.8943, 664.24255, 352.5989, 400.62854, 662.66534, 365.02478]
2025-09-12 12:33:06,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 171.0, 166.0, 36.0, 224.0, 253.0, 147.0, 155.0, 241.0, 175.0]
2025-09-12 12:33:06,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 24 minutes, 7 seconds)
2025-09-12 12:45:22,793 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:45:22,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:46:14,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 469.36603 ± 173.569
2025-09-12 12:46:14,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [596.28015, 544.8546, 546.6459, 667.51215, 529.1205, 430.55353, 23.326462, 316.0949, 490.9305, 548.3414]
2025-09-12 12:46:14,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 189.0, 208.0, 258.0, 210.0, 151.0, 34.0, 131.0, 174.0, 186.0]
2025-09-12 12:46:14,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 12 minutes, 6 seconds)
2025-09-12 12:58:26,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:58:26,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:59:26,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 534.33606 ± 104.288
2025-09-12 12:59:26,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [427.14017, 644.54285, 416.77838, 613.5564, 549.13403, 649.19727, 409.9712, 615.7291, 627.40814, 389.9026]
2025-09-12 12:59:26,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 243.0, 158.0, 227.0, 226.0, 237.0, 158.0, 239.0, 238.0, 163.0]
2025-09-12 12:59:26,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (534.34) for latency MM1Queue_a033_s075
2025-09-12 12:59:26,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 1 minute, 39 seconds)
2025-09-12 13:11:44,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:11:44,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:12:55,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 587.61804 ± 185.598
2025-09-12 13:12:55,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [520.9218, 446.74005, 406.91843, 590.44714, 776.16205, 996.5931, 615.9142, 697.9336, 483.8313, 340.7187]
2025-09-12 13:12:55,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 210.0, 209.0, 304.0, 311.0, 363.0, 230.0, 239.0, 194.0, 156.0]
2025-09-12 13:12:55,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (587.62) for latency MM1Queue_a033_s075
2025-09-12 13:12:55,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 51 minutes, 19 seconds)
2025-09-12 13:25:09,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:25:09,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:26:02,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 469.70132 ± 95.216
2025-09-12 13:26:02,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [429.2178, 449.49338, 567.94073, 322.6683, 465.52695, 392.56226, 466.52716, 447.02228, 695.0358, 461.0186]
2025-09-12 13:26:02,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 185.0, 248.0, 143.0, 184.0, 160.0, 169.0, 182.0, 236.0, 184.0]
2025-09-12 13:26:02,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 38 minutes, 57 seconds)
2025-09-12 13:38:16,105 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:38:16,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:39:13,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 507.27936 ± 116.214
2025-09-12 13:39:13,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [490.2222, 478.84018, 399.0246, 441.60983, 474.28946, 750.4623, 337.3856, 490.40152, 536.92865, 673.62946]
2025-09-12 13:39:13,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [211.0, 175.0, 171.0, 154.0, 188.0, 254.0, 158.0, 221.0, 181.0, 279.0]
2025-09-12 13:39:13,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 27 minutes, 39 seconds)
2025-09-12 13:51:24,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:51:24,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:52:29,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 569.86975 ± 98.925
2025-09-12 13:52:30,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [816.50195, 463.77405, 442.14188, 519.5093, 586.6294, 554.0534, 588.18756, 547.4212, 638.85345, 541.6255]
2025-09-12 13:52:30,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 194.0, 179.0, 186.0, 237.0, 240.0, 224.0, 220.0, 249.0, 198.0]
2025-09-12 13:52:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 15 minutes, 55 seconds)
2025-09-12 14:04:46,313 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:04:46,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:05:51,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 560.98999 ± 193.815
2025-09-12 14:05:51,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [624.6176, 584.84454, 683.29266, 760.72845, 25.568848, 645.8053, 467.2497, 667.0664, 605.7647, 544.96124]
2025-09-12 14:05:51,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [349.0, 238.0, 257.0, 287.0, 31.0, 230.0, 193.0, 243.0, 211.0, 215.0]
2025-09-12 14:05:51,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 4 minutes, 11 seconds)
2025-09-12 14:17:56,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:17:56,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:19:03,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 610.27991 ± 86.195
2025-09-12 14:19:03,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [464.2195, 641.03754, 633.9053, 570.92957, 729.6299, 571.93616, 631.55524, 585.83966, 510.9159, 762.83057]
2025-09-12 14:19:03,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 261.0, 231.0, 232.0, 236.0, 208.0, 255.0, 220.0, 187.0, 289.0]
2025-09-12 14:19:03,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (610.28) for latency MM1Queue_a033_s075
2025-09-12 14:19:03,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 48 minutes, 9 seconds)
2025-09-12 14:31:17,465 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:31:17,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:32:22,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 557.09924 ± 78.138
2025-09-12 14:32:22,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [584.1516, 600.5663, 507.9459, 474.89975, 683.1323, 650.5438, 434.30652, 546.9756, 611.8131, 476.65833]
2025-09-12 14:32:22,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 242.0, 218.0, 188.0, 275.0, 288.0, 188.0, 188.0, 255.0, 170.0]
2025-09-12 14:32:22,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 36 minutes, 41 seconds)
2025-09-12 14:44:46,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:44:46,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:45:45,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 529.04211 ± 100.447
2025-09-12 14:45:45,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [407.3915, 483.22882, 685.25214, 584.77716, 475.7776, 424.70062, 482.6145, 629.59766, 438.62433, 678.4565]
2025-09-12 14:45:45,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 190.0, 299.0, 229.0, 163.0, 208.0, 177.0, 222.0, 157.0, 226.0]
2025-09-12 14:45:45,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 25 minutes, 28 seconds)
2025-09-12 14:58:02,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:58:02,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:58:48,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 400.28995 ± 256.686
2025-09-12 14:58:48,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [657.7968, 457.28326, 7.8490973, 11.364576, 551.7536, 451.0766, 48.783493, 649.6347, 531.07666, 636.28064]
2025-09-12 14:58:48,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 182.0, 19.0, 24.0, 198.0, 206.0, 48.0, 252.0, 215.0, 232.0]
2025-09-12 14:58:48,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 10 minutes, 3 seconds)
2025-09-12 15:10:58,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:10:58,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:11:57,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 541.82733 ± 207.031
2025-09-12 15:11:57,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [28.098236, 754.09344, 447.13498, 562.0262, 769.411, 482.9097, 464.432, 752.2974, 610.2683, 547.6018]
2025-09-12 15:11:57,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [36.0, 289.0, 160.0, 220.0, 260.0, 190.0, 177.0, 260.0, 255.0, 190.0]
2025-09-12 15:11:57,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 54 minutes, 58 seconds)
2025-09-12 15:23:55,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:23:55,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:24:58,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 590.51917 ± 120.710
2025-09-12 15:24:58,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [713.01917, 444.5979, 530.985, 523.1618, 478.18414, 748.6324, 613.2274, 428.4161, 772.33795, 652.62976]
2025-09-12 15:24:58,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [267.0, 197.0, 198.0, 194.0, 167.0, 256.0, 230.0, 181.0, 252.0, 212.0]
2025-09-12 15:24:58,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 40 minutes, 1 second)
2025-09-12 15:37:12,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:37:12,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:38:20,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 591.12634 ± 180.806
2025-09-12 15:38:20,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [750.5865, 535.7457, 597.51294, 549.2068, 793.2579, 641.51294, 113.94014, 737.43066, 641.09485, 550.9757]
2025-09-12 15:38:20,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [300.0, 203.0, 246.0, 216.0, 304.0, 224.0, 69.0, 343.0, 246.0, 195.0]
2025-09-12 15:38:20,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 27 minutes, 19 seconds)
2025-09-12 15:50:44,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:50:44,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:52:05,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 694.94025 ± 210.832
2025-09-12 15:52:05,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [633.8277, 536.0757, 748.7898, 582.0862, 547.81946, 549.7251, 601.9705, 752.0652, 1282.1871, 714.85583]
2025-09-12 15:52:05,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 218.0, 266.0, 203.0, 213.0, 219.0, 226.0, 271.0, 638.0, 296.0]
2025-09-12 15:52:05,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (694.94) for latency MM1Queue_a033_s075
2025-09-12 15:52:05,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 17 minutes, 8 seconds)
2025-09-12 16:04:17,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:04:17,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:05:21,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 610.74982 ± 67.903
2025-09-12 16:05:21,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [498.97366, 574.45074, 537.6243, 678.55914, 569.2329, 590.4383, 719.5402, 699.79114, 631.6857, 607.2023]
2025-09-12 16:05:21,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 208.0, 224.0, 221.0, 203.0, 227.0, 252.0, 231.0, 226.0, 215.0]
2025-09-12 16:05:21,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 5 minutes, 44 seconds)
2025-09-12 16:17:21,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:17:21,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:18:30,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 650.26282 ± 184.381
2025-09-12 16:18:30,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [518.2533, 850.8069, 564.0426, 425.62665, 496.7722, 546.65326, 1079.8325, 628.0523, 717.13574, 675.45306]
2025-09-12 16:18:30,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 306.0, 209.0, 194.0, 178.0, 199.0, 358.0, 252.0, 245.0, 239.0]
2025-09-12 16:18:30,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 52 minutes, 19 seconds)
2025-09-12 16:30:46,997 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:30:46,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:31:49,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 616.54431 ± 81.177
2025-09-12 16:31:49,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [671.3779, 526.28796, 724.25836, 571.443, 690.4556, 519.4198, 696.1839, 526.33997, 696.3597, 543.31635]
2025-09-12 16:31:49,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [245.0, 207.0, 217.0, 224.0, 243.0, 187.0, 225.0, 179.0, 244.0, 200.0]
2025-09-12 16:31:49,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 41 minutes, 31 seconds)
2025-09-12 16:44:16,616 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:16,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:45:23,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 617.83344 ± 71.942
2025-09-12 16:45:24,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [775.545, 521.1774, 578.724, 678.12115, 641.4648, 602.2862, 619.1893, 567.008, 661.27405, 533.545]
2025-09-12 16:45:24,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 211.0, 206.0, 257.0, 247.0, 259.0, 233.0, 202.0, 245.0, 208.0]
2025-09-12 16:45:24,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 29 minutes, 39 seconds)
2025-09-12 16:57:30,302 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:57:30,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:58:42,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 638.20367 ± 114.443
2025-09-12 16:58:42,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [474.32516, 727.1824, 730.3855, 578.60364, 769.9791, 624.0516, 664.01263, 693.9436, 718.69464, 400.85837]
2025-09-12 16:58:42,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [260.0, 241.0, 279.0, 209.0, 263.0, 217.0, 239.0, 249.0, 280.0, 226.0]
2025-09-12 16:58:42,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 12 minutes, 54 seconds)
2025-09-12 17:10:48,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:10:48,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:11:56,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 615.19318 ± 106.437
2025-09-12 17:11:56,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [492.6966, 521.3148, 820.02655, 522.79504, 580.6088, 716.03186, 646.19794, 587.1865, 744.9272, 520.1459]
2025-09-12 17:11:56,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [185.0, 192.0, 287.0, 209.0, 216.0, 259.0, 254.0, 207.0, 348.0, 216.0]
2025-09-12 17:11:56,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 59 minutes, 24 seconds)
2025-09-12 17:24:30,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:24:30,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:25:28,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 557.40167 ± 206.103
2025-09-12 17:25:28,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [473.76877, 645.284, 536.1711, 12.261609, 558.1587, 490.18347, 635.14307, 723.45966, 761.42316, 738.16296]
2025-09-12 17:25:28,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 218.0, 179.0, 21.0, 219.0, 181.0, 217.0, 267.0, 280.0, 241.0]
2025-09-12 17:25:28,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 48 minutes, 49 seconds)
2025-09-12 17:37:28,757 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:37:28,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:38:39,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 656.39783 ± 130.027
2025-09-12 17:38:39,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [768.94183, 553.3006, 635.34064, 672.0666, 628.02936, 551.48285, 756.2152, 566.2014, 948.60254, 483.79776]
2025-09-12 17:38:39,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [342.0, 211.0, 231.0, 232.0, 222.0, 215.0, 269.0, 194.0, 331.0, 204.0]
2025-09-12 17:38:39,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 34 minutes, 28 seconds)
2025-09-12 17:51:03,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:51:03,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:52:16,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 698.01160 ± 124.649
2025-09-12 17:52:16,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [956.9929, 811.18054, 786.8794, 687.76984, 634.86835, 484.18103, 610.3491, 660.186, 735.0851, 612.6237]
2025-09-12 17:52:16,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [349.0, 293.0, 254.0, 280.0, 218.0, 185.0, 194.0, 239.0, 255.0, 227.0]
2025-09-12 17:52:16,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (698.01) for latency MM1Queue_a033_s075
2025-09-12 17:52:16,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 21 minutes, 19 seconds)
2025-09-12 18:04:15,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:04:15,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:05:28,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 702.55798 ± 126.288
2025-09-12 18:05:28,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1052.7444, 690.3023, 690.0239, 693.7599, 649.2205, 706.07465, 528.394, 669.3936, 658.45593, 687.2105]
2025-09-12 18:05:28,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [401.0, 261.0, 261.0, 230.0, 216.0, 245.0, 210.0, 248.0, 202.0, 236.0]
2025-09-12 18:05:28,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (702.56) for latency MM1Queue_a033_s075
2025-09-12 18:05:28,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 7 minutes, 18 seconds)
2025-09-12 18:17:33,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:17:33,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:18:47,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 675.86786 ± 227.670
2025-09-12 18:18:47,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [664.0134, 517.401, 760.5786, 794.29395, 847.70825, 78.8161, 593.34, 787.33484, 840.3228, 874.8697]
2025-09-12 18:18:47,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [229.0, 172.0, 285.0, 295.0, 317.0, 72.0, 241.0, 263.0, 334.0, 337.0]
2025-09-12 18:18:47,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 54 minutes, 23 seconds)
2025-09-12 18:30:47,380 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:30:47,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:31:50,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 617.92493 ± 241.928
2025-09-12 18:31:50,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [647.83673, 514.63306, 602.2508, 25.629967, 619.3311, 661.5972, 571.13086, 1045.3129, 785.9277, 705.59863]
2025-09-12 18:31:50,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 206.0, 217.0, 34.0, 231.0, 207.0, 213.0, 374.0, 300.0, 230.0]
2025-09-12 18:31:50,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 38 minutes, 9 seconds)
2025-09-12 18:43:46,372 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:43:46,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:44:58,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 741.03113 ± 177.570
2025-09-12 18:44:58,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [374.09842, 869.46686, 794.0444, 716.20514, 676.0737, 641.5295, 718.44904, 662.4849, 863.4137, 1094.5455]
2025-09-12 18:44:58,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 288.0, 250.0, 230.0, 226.0, 218.0, 248.0, 217.0, 299.0, 386.0]
2025-09-12 18:44:58,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (741.03) for latency MM1Queue_a033_s075
2025-09-12 18:44:58,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 24 minutes, 33 seconds)
2025-09-12 18:57:03,752 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:57:03,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:58:12,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 673.98962 ± 116.223
2025-09-12 18:58:12,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [496.2285, 553.3078, 765.872, 923.2979, 630.00726, 720.4344, 743.0543, 679.84753, 580.70355, 647.1427]
2025-09-12 18:58:12,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [198.0, 210.0, 261.0, 312.0, 222.0, 290.0, 263.0, 228.0, 200.0, 226.0]
2025-09-12 18:58:12,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 9 minutes, 16 seconds)
2025-09-12 19:10:19,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:10:19,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:11:47,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 849.11444 ± 114.963
2025-09-12 19:11:47,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [819.63873, 939.2849, 788.00146, 997.7592, 906.85803, 810.4077, 706.7629, 624.39075, 919.23175, 978.8094]
2025-09-12 19:11:47,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 319.0, 257.0, 483.0, 314.0, 302.0, 271.0, 231.0, 324.0, 330.0]
2025-09-12 19:11:47,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (849.11) for latency MM1Queue_a033_s075
2025-09-12 19:11:47,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 58 minutes, 7 seconds)
2025-09-12 19:23:45,963 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:23:45,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:24:59,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 692.06531 ± 286.281
2025-09-12 19:24:59,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [655.44586, 706.269, 698.89325, 695.00256, 1238.4706, 713.46783, 649.1785, 946.4129, 27.18064, 590.33154]
2025-09-12 19:24:59,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 264.0, 250.0, 243.0, 477.0, 261.0, 251.0, 332.0, 37.0, 246.0]
2025-09-12 19:24:59,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 44 minutes, 18 seconds)
2025-09-12 19:36:49,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:36:49,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:38:02,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 730.26654 ± 113.903
2025-09-12 19:38:02,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [717.8698, 829.14496, 820.73456, 872.0289, 672.61523, 737.6903, 662.9148, 510.92804, 872.2938, 606.4449]
2025-09-12 19:38:02,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [256.0, 320.0, 274.0, 317.0, 238.0, 242.0, 228.0, 184.0, 304.0, 220.0]
2025-09-12 19:38:02,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 31 minutes, 2 seconds)
2025-09-12 19:50:02,560 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:50:02,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:51:16,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 746.22998 ± 148.694
2025-09-12 19:51:16,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [769.9941, 665.5974, 655.51, 1064.3427, 566.10785, 963.6769, 650.4394, 643.45215, 691.10065, 792.0785]
2025-09-12 19:51:16,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 215.0, 224.0, 355.0, 206.0, 324.0, 230.0, 222.0, 259.0, 313.0]
2025-09-12 19:51:16,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 18 minutes, 15 seconds)
2025-09-12 20:03:34,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:03:34,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:04:42,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 698.24176 ± 199.571
2025-09-12 20:04:42,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1192.4241, 608.95483, 924.51996, 615.54224, 670.53674, 510.20938, 715.35596, 508.97635, 656.19275, 579.7052]
2025-09-12 20:04:42,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [403.0, 199.0, 306.0, 212.0, 213.0, 210.0, 219.0, 178.0, 237.0, 194.0]
2025-09-12 20:04:42,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 5 minutes, 52 seconds)
2025-09-12 20:16:43,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:16:43,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:17:59,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 815.34314 ± 114.281
2025-09-12 20:17:59,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [934.66736, 641.1819, 705.0723, 921.6307, 718.44476, 852.9633, 905.4578, 704.0725, 988.82806, 781.1123]
2025-09-12 20:17:59,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [309.0, 223.0, 235.0, 279.0, 288.0, 274.0, 264.0, 240.0, 313.0, 247.0]
2025-09-12 20:17:59,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 51 minutes, 18 seconds)
2025-09-12 20:29:41,972 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:29:41,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:31:09,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 854.67072 ± 168.076
2025-09-12 20:31:09,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [788.31384, 1170.4867, 957.4948, 969.4023, 693.4875, 746.0949, 860.09546, 958.52094, 873.23676, 529.5736]
2025-09-12 20:31:09,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [266.0, 448.0, 319.0, 389.0, 239.0, 277.0, 320.0, 337.0, 309.0, 192.0]
2025-09-12 20:31:09,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (854.67) for latency MM1Queue_a033_s075
2025-09-12 20:31:09,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 37 minutes, 53 seconds)
2025-09-12 20:43:05,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:43:05,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:44:21,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 802.86340 ± 137.492
2025-09-12 20:44:21,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [720.0383, 727.40784, 783.0611, 698.32446, 759.6783, 735.05884, 893.46875, 915.8011, 1142.5168, 653.2788]
2025-09-12 20:44:21,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 238.0, 269.0, 262.0, 246.0, 224.0, 267.0, 287.0, 360.0, 258.0]
2025-09-12 20:44:21,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 25 minutes, 16 seconds)
2025-09-12 20:56:24,995 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:56:25,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:57:33,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 707.64001 ± 180.647
2025-09-12 20:57:33,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [516.882, 612.5955, 667.31915, 654.8052, 727.31854, 696.3994, 579.12555, 570.5399, 887.2609, 1164.1538]
2025-09-12 20:57:33,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 212.0, 223.0, 229.0, 256.0, 246.0, 200.0, 209.0, 296.0, 365.0]
2025-09-12 20:57:33,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 11 minutes, 51 seconds)
2025-09-12 21:09:33,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:09:33,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:10:48,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 746.50928 ± 297.768
2025-09-12 21:10:48,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1156.4111, 1042.2911, 647.4472, 94.834114, 664.7578, 1042.221, 760.5292, 905.7589, 671.4211, 479.42194]
2025-09-12 21:10:48,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [389.0, 332.0, 222.0, 95.0, 247.0, 369.0, 266.0, 318.0, 225.0, 184.0]
2025-09-12 21:10:48,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 57 minutes, 58 seconds)
2025-09-12 21:22:44,820 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:22:44,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:23:58,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 736.25647 ± 311.551
2025-09-12 21:23:58,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1021.29767, 2.9039907, 746.87274, 1169.1051, 959.84094, 689.315, 618.9599, 499.6326, 940.42224, 714.21484]
2025-09-12 21:23:58,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [363.0, 12.0, 231.0, 402.0, 303.0, 240.0, 261.0, 197.0, 322.0, 246.0]
2025-09-12 21:23:58,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 44 minutes, 20 seconds)
2025-09-12 21:36:12,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:36:12,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:38:26,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1335.11230 ± 556.549
2025-09-12 21:38:26,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [390.88815, 1413.1251, 1007.4718, 848.11426, 1461.5588, 1223.5831, 2122.9866, 2343.794, 969.6636, 1569.938]
2025-09-12 21:38:26,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 467.0, 379.0, 264.0, 492.0, 457.0, 860.0, 788.0, 342.0, 526.0]
2025-09-12 21:38:26,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (1335.11) for latency MM1Queue_a033_s075
2025-09-12 21:38:26,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 35 minutes, 16 seconds)
2025-09-12 21:50:16,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:50:16,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:51:59,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1001.77673 ± 590.092
2025-09-12 21:51:59,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [514.3876, 1180.4991, 657.53705, 849.02814, 655.13574, 869.7272, 2618.5676, 460.3304, 1151.813, 1060.7422]
2025-09-12 21:51:59,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 404.0, 208.0, 307.0, 258.0, 294.0, 963.0, 176.0, 456.0, 348.0]
2025-09-12 21:51:59,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 22 minutes, 52 seconds)
2025-09-12 22:03:59,477 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:03:59,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:05:25,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 858.22083 ± 214.191
2025-09-12 22:05:25,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1066.5868, 1221.3986, 992.21027, 966.5275, 506.78937, 934.9964, 586.0274, 785.7926, 875.21545, 646.66345]
2025-09-12 22:05:25,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [326.0, 444.0, 324.0, 330.0, 180.0, 374.0, 195.0, 332.0, 279.0, 226.0]
2025-09-12 22:05:25,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 10 minutes, 1 second)
2025-09-12 22:17:28,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:17:28,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:19:11,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 993.81689 ± 332.551
2025-09-12 22:19:11,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1220.7338, 922.0513, 1207.8785, 739.6671, 1471.3958, 1425.2765, 996.3776, 323.98282, 737.1187, 893.68646]
2025-09-12 22:19:11,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [465.0, 294.0, 441.0, 252.0, 577.0, 510.0, 370.0, 145.0, 254.0, 298.0]
2025-09-12 22:19:11,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 57 minutes, 49 seconds)
2025-09-12 22:31:23,437 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:31:23,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:32:49,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 863.81848 ± 425.526
2025-09-12 22:32:49,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [912.05, 1688.806, 802.4744, 905.57697, 667.0465, 39.516052, 972.5556, 1393.9419, 620.4553, 635.76135]
2025-09-12 22:32:49,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [315.0, 571.0, 251.0, 292.0, 277.0, 43.0, 402.0, 461.0, 201.0, 221.0]
2025-09-12 22:32:49,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 45 minutes, 13 seconds)
2025-09-12 22:44:23,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:44:23,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:46:09,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1034.39282 ± 229.033
2025-09-12 22:46:09,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1015.8047, 647.12115, 1108.1276, 1028.225, 1000.54156, 791.5655, 1485.2313, 1342.5681, 967.3282, 957.415]
2025-09-12 22:46:09,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [342.0, 261.0, 382.0, 384.0, 409.0, 290.0, 542.0, 449.0, 348.0, 320.0]
2025-09-12 22:46:09,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 28 minutes, 59 seconds)
2025-09-12 22:57:59,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:57:59,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:59:49,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1147.51562 ± 511.549
2025-09-12 22:59:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1368.0966, 981.2782, 896.2394, 1023.9252, 1820.6648, 1081.4452, 545.7381, 716.2507, 753.33636, 2288.1829]
2025-09-12 22:59:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [458.0, 347.0, 296.0, 332.0, 602.0, 367.0, 193.0, 266.0, 233.0, 779.0]
2025-09-12 22:59:49,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 15 minutes, 40 seconds)
2025-09-12 23:12:03,244 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:12:03,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:14:17,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1298.02612 ± 557.072
2025-09-12 23:14:17,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1512.5974, 1458.0886, 667.4329, 1289.2008, 870.0216, 1326.0566, 902.0521, 2651.867, 694.6935, 1608.2518]
2025-09-12 23:14:17,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [525.0, 532.0, 245.0, 457.0, 339.0, 479.0, 330.0, 1000.0, 219.0, 607.0]
2025-09-12 23:14:17,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 3 minutes, 57 seconds)
2025-09-12 23:25:55,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:25:55,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:27:28,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 969.48761 ± 349.467
2025-09-12 23:27:28,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [850.0379, 691.06915, 741.09894, 1540.4076, 674.10223, 1270.6279, 1153.9177, 1490.1957, 805.1308, 478.28833]
2025-09-12 23:27:28,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 245.0, 243.0, 540.0, 238.0, 439.0, 351.0, 519.0, 283.0, 161.0]
2025-09-12 23:27:28,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 49 minutes, 14 seconds)
2025-09-12 23:39:36,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:39:36,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:42:05,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1412.95874 ± 735.985
2025-09-12 23:42:05,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2837.208, 1044.5795, 2009.4812, 872.99646, 759.72815, 2367.5393, 526.7278, 1745.2643, 953.79205, 1012.26917]
2025-09-12 23:42:05,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 356.0, 705.0, 309.0, 284.0, 1000.0, 193.0, 673.0, 377.0, 346.0]
2025-09-12 23:42:05,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (1412.96) for latency MM1Queue_a033_s075
2025-09-12 23:42:05,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 36 minutes, 58 seconds)
2025-09-12 23:54:05,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:54:05,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:55:38,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 929.06708 ± 402.790
2025-09-12 23:55:38,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1673.7532, 1162.6715, 458.99774, 714.2956, 633.4503, 643.80304, 747.0397, 935.4165, 1628.064, 693.179]
2025-09-12 23:55:38,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [506.0, 410.0, 174.0, 239.0, 246.0, 238.0, 252.0, 310.0, 629.0, 256.0]
2025-09-12 23:55:38,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 23 minutes, 22 seconds)
2025-09-13 00:07:50,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:07:50,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:09:48,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1187.97070 ± 473.632
2025-09-13 00:09:48,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [691.61395, 979.4966, 877.38135, 1950.0061, 1552.1727, 583.6453, 1652.7858, 836.48535, 1806.1722, 949.94794]
2025-09-13 00:09:48,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [272.0, 312.0, 329.0, 648.0, 568.0, 225.0, 538.0, 325.0, 622.0, 286.0]
2025-09-13 00:09:48,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 9 minutes, 58 seconds)
2025-09-13 00:21:57,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:21:57,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:23:48,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1067.59534 ± 466.859
2025-09-13 00:23:48,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [700.70276, 715.7836, 466.78217, 1384.0554, 1839.741, 1767.1351, 1264.8397, 798.39923, 565.5678, 1172.9459]
2025-09-13 00:23:48,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [246.0, 241.0, 203.0, 480.0, 640.0, 600.0, 440.0, 378.0, 195.0, 447.0]
2025-09-13 00:23:48,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 55 minutes, 36 seconds)
2025-09-13 00:35:38,783 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:35:38,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:38:23,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1608.59753 ± 877.126
2025-09-13 00:38:23,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2589.9192, 1441.6255, 898.08215, 442.47134, 1406.9492, 681.1148, 734.9785, 2478.0593, 2644.7822, 2767.994]
2025-09-13 00:38:23,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 507.0, 267.0, 156.0, 514.0, 224.0, 237.0, 1000.0, 907.0, 1000.0]
2025-09-13 00:38:23,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1226 [INFO]: New best (1608.60) for latency MM1Queue_a033_s075
2025-09-13 00:38:23,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 42 minutes, 32 seconds)
2025-09-13 00:50:06,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:50:06,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:51:58,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1158.29602 ± 519.379
2025-09-13 00:51:58,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [794.0049, 2075.8608, 750.23193, 514.92334, 765.4968, 1849.1084, 1651.6917, 662.287, 1238.0227, 1281.3322]
2025-09-13 00:51:58,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 772.0, 258.0, 176.0, 261.0, 617.0, 521.0, 220.0, 422.0, 453.0]
2025-09-13 00:51:58,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 27 minutes, 56 seconds)
2025-09-13 01:03:41,768 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:03:41,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:05:16,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 883.41418 ± 450.356
2025-09-13 01:05:16,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1065.463, 1259.1583, 729.44916, 589.5044, 1060.9127, 461.90625, 453.54156, 500.4804, 742.5006, 1971.2252]
2025-09-13 01:05:16,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [379.0, 454.0, 251.0, 324.0, 380.0, 204.0, 178.0, 199.0, 244.0, 705.0]
2025-09-13 01:05:16,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 55 seconds)
2025-09-13 01:17:24,941 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:17:24,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:19:57,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1549.94116 ± 684.191
2025-09-13 01:19:57,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1085.0852, 1330.4677, 1766.6833, 637.36554, 1055.2777, 2514.6357, 1098.1201, 2708.4592, 1021.2558, 2282.061]
2025-09-13 01:19:57,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [364.0, 433.0, 596.0, 236.0, 384.0, 810.0, 404.0, 1000.0, 400.0, 773.0]
2025-09-13 01:19:57,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-walker2d):1251 [DEBUG]: Training session finished
