2025-09-12 03:09:54,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc15-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:09:54,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc15-walker2d/MM1Queue_a033_s075-mbpac_memdelay
2025-09-12 03:09:54,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1480343ef850>}
2025-09-12 03:09:54,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1111 [DEBUG]: using device: cuda
2025-09-12 03:09:54,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1133 [INFO]: Creating new trainer
2025-09-12 03:09:54,933 baseline-mbpac-noiseperc15-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-12 03:09:54,933 baseline-mbpac-noiseperc15-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 03:09:54,942 baseline-mbpac-noiseperc15-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 03:09:55,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1194 [DEBUG]: Starting training session...
2025-09-12 03:09:55,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 1/100
2025-09-12 03:20:22,485 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:20:22,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:21:15,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 154.53134 ± 134.569
2025-09-12 03:21:15,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [27.98025, 257.0892, 175.57472, 52.94424, 338.71143, 204.91869, 35.93307, 395.43225, 69.56834, -12.838915]
2025-09-12 03:21:15,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 143.0, 189.0, 155.0, 207.0, 346.0, 174.0, 304.0, 209.0, 95.0]
2025-09-12 03:21:15,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (154.53) for latency MM1Queue_a033_s075
2025-09-12 03:21:15,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 41 minutes, 18 seconds)
2025-09-12 03:33:25,080 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:33:25,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:34:10,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 100.44245 ± 127.787
2025-09-12 03:34:10,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [284.98282, 7.3924904, -3.647372, -7.6388664, -27.438177, -23.813988, 232.09889, 279.74573, 217.61787, 45.125145]
2025-09-12 03:34:10,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 41.0, 101.0, 126.0, 150.0, 97.0, 489.0, 161.0, 139.0, 109.0]
2025-09-12 03:34:10,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 47 minutes, 54 seconds)
2025-09-12 03:46:03,102 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:03,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 03:47:04,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 226.67017 ± 151.123
2025-09-12 03:47:04,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [108.28964, 540.4356, 141.54765, 40.400795, 286.17084, 103.15073, 218.16595, 417.2858, 300.1972, 111.05738]
2025-09-12 03:47:04,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 529.0, 283.0, 41.0, 166.0, 145.0, 144.0, 301.0, 203.0, 151.0]
2025-09-12 03:47:04,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (226.67) for latency MM1Queue_a033_s075
2025-09-12 03:47:04,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 20 hours, 46 seconds)
2025-09-12 03:59:08,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:59:08,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:00:19,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 251.18661 ± 155.376
2025-09-12 04:00:19,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [354.9994, 504.90823, 172.00746, 294.72253, 124.48126, 159.02089, 97.486305, 501.54776, 34.492443, 268.19998]
2025-09-12 04:00:19,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [301.0, 460.0, 235.0, 170.0, 168.0, 247.0, 255.0, 408.0, 52.0, 171.0]
2025-09-12 04:00:19,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (251.19) for latency MM1Queue_a033_s075
2025-09-12 04:00:19,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 9 minutes, 37 seconds)
2025-09-12 04:12:17,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:12:17,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:13:26,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 281.72784 ± 135.890
2025-09-12 04:13:26,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [498.97202, 57.96035, 316.2209, 419.94235, 119.6768, 295.3022, 338.4566, 118.96628, 385.24777, 266.53323]
2025-09-12 04:13:26,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [328.0, 139.0, 190.0, 353.0, 227.0, 177.0, 202.0, 133.0, 490.0, 183.0]
2025-09-12 04:13:26,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (281.73) for latency MM1Queue_a033_s075
2025-09-12 04:13:26,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 20 hours, 6 minutes, 47 seconds)
2025-09-12 04:25:42,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:25:42,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:26:45,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 291.08749 ± 178.126
2025-09-12 04:26:45,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [101.69174, 386.8085, 51.282368, 32.898533, 615.82513, 276.22668, 349.9129, 286.14627, 485.52997, 324.55267]
2025-09-12 04:26:45,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 246.0, 186.0, 45.0, 455.0, 157.0, 238.0, 150.0, 350.0, 240.0]
2025-09-12 04:26:45,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (291.09) for latency MM1Queue_a033_s075
2025-09-12 04:26:45,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 31 minutes, 21 seconds)
2025-09-12 04:38:33,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:38:33,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:39:25,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 244.25533 ± 171.580
2025-09-12 04:39:25,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [55.08085, 481.17212, 266.47137, 0.98246765, 348.36435, 55.60908, 66.66475, 408.61734, 356.51254, 403.07822]
2025-09-12 04:39:25,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [67.0, 318.0, 152.0, 11.0, 235.0, 209.0, 91.0, 294.0, 216.0, 221.0]
2025-09-12 04:39:25,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 20 hours, 13 minutes, 43 seconds)
2025-09-12 04:51:36,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:51:36,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 04:52:18,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 227.66797 ± 144.858
2025-09-12 04:52:18,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [370.5439, 25.758093, 376.8764, 397.7678, 352.06274, 221.93964, 252.63844, 5.4381742, 48.963776, 224.69061]
2025-09-12 04:52:18,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 55.0, 256.0, 277.0, 185.0, 128.0, 134.0, 18.0, 56.0, 132.0]
2025-09-12 04:52:18,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 25 seconds)
2025-09-12 05:04:23,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:04:23,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:05:18,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 346.57785 ± 140.380
2025-09-12 05:05:18,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [304.48648, 383.83792, 647.7329, 341.22842, 342.3337, 332.81543, 410.64545, 403.82242, 54.924923, 243.95091]
2025-09-12 05:05:18,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 200.0, 311.0, 183.0, 169.0, 160.0, 294.0, 209.0, 66.0, 143.0]
2025-09-12 05:05:18,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (346.58) for latency MM1Queue_a033_s075
2025-09-12 05:05:18,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 42 minutes, 31 seconds)
2025-09-12 05:17:15,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:17:15,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:18:40,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 350.32391 ± 197.357
2025-09-12 05:18:40,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [628.91284, 313.58893, 278.10086, 244.41756, 194.65558, 513.9557, 401.0899, 168.75804, 59.09395, 700.6659]
2025-09-12 05:18:40,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [630.0, 337.0, 230.0, 328.0, 166.0, 275.0, 217.0, 132.0, 117.0, 531.0]
2025-09-12 05:18:40,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (350.32) for latency MM1Queue_a033_s075
2025-09-12 05:18:40,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 34 minutes, 5 seconds)
2025-09-12 05:30:41,086 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:30:41,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:31:34,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 355.92105 ± 96.097
2025-09-12 05:31:34,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [271.8429, 259.70358, 317.38333, 392.60767, 572.75214, 429.04346, 400.6368, 225.69775, 373.23547, 316.3077]
2025-09-12 05:31:34,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 136.0, 159.0, 182.0, 305.0, 208.0, 202.0, 129.0, 194.0, 211.0]
2025-09-12 05:31:34,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (355.92) for latency MM1Queue_a033_s075
2025-09-12 05:31:34,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 13 minutes, 53 seconds)
2025-09-12 05:43:51,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:43:51,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:44:51,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 402.45490 ± 76.025
2025-09-12 05:44:51,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [443.7192, 481.83194, 336.76166, 354.15897, 374.53918, 475.03452, 303.73425, 294.9664, 436.2923, 523.5105]
2025-09-12 05:44:51,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [198.0, 208.0, 214.0, 159.0, 283.0, 251.0, 155.0, 142.0, 221.0, 277.0]
2025-09-12 05:44:51,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (402.45) for latency MM1Queue_a033_s075
2025-09-12 05:44:51,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 11 minutes, 37 seconds)
2025-09-12 05:56:38,724 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:56:38,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 05:57:19,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 259.36789 ± 215.903
2025-09-12 05:57:19,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [487.90002, 566.2711, 66.327736, 268.11392, 414.88232, 559.98724, 1.2570084, 79.7063, 92.821884, 56.411316]
2025-09-12 05:57:19,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 272.0, 71.0, 138.0, 203.0, 258.0, 14.0, 79.0, 86.0, 57.0]
2025-09-12 05:57:19,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 51 minutes, 20 seconds)
2025-09-12 06:09:27,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:09:27,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:10:33,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 446.81592 ± 208.129
2025-09-12 06:10:33,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [481.4864, 96.47463, 481.19424, 317.45764, 905.40094, 429.08282, 683.5479, 341.63885, 382.03647, 349.83923]
2025-09-12 06:10:33,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 160.0, 273.0, 204.0, 333.0, 282.0, 320.0, 150.0, 184.0, 165.0]
2025-09-12 06:10:33,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (446.82) for latency MM1Queue_a033_s075
2025-09-12 06:10:33,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 42 minutes, 21 seconds)
2025-09-12 06:22:57,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:22:57,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:23:44,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 351.69058 ± 187.955
2025-09-12 06:23:44,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [308.40033, 247.65356, 280.47708, 191.50839, 1.9421171, 309.66406, 409.62888, 619.3487, 497.37173, 650.9109]
2025-09-12 06:23:44,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 187.0, 154.0, 97.0, 12.0, 134.0, 181.0, 264.0, 214.0, 238.0]
2025-09-12 06:23:44,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 26 minutes, 17 seconds)
2025-09-12 06:35:37,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:35:37,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:36:39,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 512.30109 ± 163.653
2025-09-12 06:36:39,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [517.3991, 527.454, 287.60013, 682.684, 281.31897, 824.57666, 458.16888, 426.58508, 452.2868, 664.93726]
2025-09-12 06:36:39,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [195.0, 347.0, 131.0, 270.0, 128.0, 304.0, 197.0, 188.0, 182.0, 242.0]
2025-09-12 06:36:39,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (512.30) for latency MM1Queue_a033_s075
2025-09-12 06:36:39,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 13 minutes, 15 seconds)
2025-09-12 06:48:46,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:48:46,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 06:49:48,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 333.49918 ± 223.851
2025-09-12 06:49:48,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [309.6623, 421.62845, 315.4973, 307.5682, 667.55994, 596.12555, 14.30218, 62.42434, 580.3666, 59.856396]
2025-09-12 06:49:48,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [163.0, 171.0, 502.0, 155.0, 455.0, 308.0, 23.0, 59.0, 221.0, 114.0]
2025-09-12 06:49:48,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 58 minutes, 14 seconds)
2025-09-12 07:01:39,302 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:01:39,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:02:52,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 455.86652 ± 94.758
2025-09-12 07:02:52,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [369.62427, 445.19455, 427.56146, 384.87674, 496.86768, 494.72128, 408.8784, 691.4426, 500.0446, 339.4529]
2025-09-12 07:02:52,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 263.0, 233.0, 249.0, 270.0, 227.0, 312.0, 417.0, 233.0, 218.0]
2025-09-12 07:02:52,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 54 minutes, 59 seconds)
2025-09-12 07:14:58,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:14:58,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:15:47,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 387.99609 ± 277.859
2025-09-12 07:15:47,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [19.471361, 2.5297208, 256.97522, 368.02942, 260.2208, 438.24396, 844.477, 834.98083, 568.85034, 286.18225]
2025-09-12 07:15:47,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 19.0, 117.0, 171.0, 130.0, 182.0, 346.0, 357.0, 222.0, 147.0]
2025-09-12 07:15:47,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 36 minutes, 40 seconds)
2025-09-12 07:28:07,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:28:07,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:29:03,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 419.03497 ± 183.355
2025-09-12 07:29:03,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [694.0581, 521.34265, 255.44606, 398.75922, 340.03302, 393.13126, 628.35986, 257.69287, 89.198524, 612.3279]
2025-09-12 07:29:03,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 175.0, 132.0, 175.0, 195.0, 187.0, 246.0, 160.0, 101.0, 294.0]
2025-09-12 07:29:03,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 25 minutes, 1 second)
2025-09-12 07:41:07,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:41:07,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:42:10,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 510.05200 ± 130.230
2025-09-12 07:42:10,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [473.44122, 397.14996, 401.10275, 632.6153, 492.7082, 514.0426, 323.65552, 557.85754, 811.84595, 496.1013]
2025-09-12 07:42:10,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 156.0, 251.0, 235.0, 307.0, 205.0, 165.0, 196.0, 350.0, 172.0]
2025-09-12 07:42:11,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 15 minutes, 20 seconds)
2025-09-12 07:54:06,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:54:06,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 07:54:59,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 425.46356 ± 217.779
2025-09-12 07:54:59,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [392.58908, 572.18054, 660.94904, 813.47034, 346.4238, 318.75604, 221.97025, 36.09497, 315.40408, 576.7975]
2025-09-12 07:54:59,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 230.0, 291.0, 264.0, 140.0, 151.0, 109.0, 55.0, 187.0, 257.0]
2025-09-12 07:54:59,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 56 minutes, 48 seconds)
2025-09-12 08:07:05,680 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:07:05,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:08:11,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 555.24225 ± 222.924
2025-09-12 08:08:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [305.53574, 477.8093, 487.27225, 807.3886, 234.75758, 696.1612, 539.8703, 1007.8915, 383.5745, 612.1611]
2025-09-12 08:08:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 215.0, 193.0, 311.0, 121.0, 227.0, 277.0, 335.0, 162.0, 271.0]
2025-09-12 08:08:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (555.24) for latency MM1Queue_a033_s075
2025-09-12 08:08:11,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 45 minutes, 55 seconds)
2025-09-12 08:20:18,209 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:20:18,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:21:24,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 618.23767 ± 256.811
2025-09-12 08:21:24,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1216.2751, 735.1896, 344.31287, 759.98364, 308.61932, 454.78833, 468.4232, 643.96716, 779.15643, 471.66135]
2025-09-12 08:21:24,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [423.0, 243.0, 163.0, 282.0, 126.0, 179.0, 169.0, 240.0, 266.0, 199.0]
2025-09-12 08:21:24,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (618.24) for latency MM1Queue_a033_s075
2025-09-12 08:21:24,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 37 minutes, 31 seconds)
2025-09-12 08:33:35,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:33:35,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:34:55,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 642.88770 ± 108.060
2025-09-12 08:34:55,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [578.1445, 724.35406, 655.6008, 577.9455, 706.3515, 725.6661, 758.00775, 367.05417, 674.8409, 660.91187]
2025-09-12 08:34:55,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [237.0, 493.0, 238.0, 232.0, 338.0, 304.0, 256.0, 153.0, 271.0, 259.0]
2025-09-12 08:34:55,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (642.89) for latency MM1Queue_a033_s075
2025-09-12 08:34:55,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 27 minutes, 53 seconds)
2025-09-12 08:46:53,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:46:53,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 08:48:07,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 632.04968 ± 310.345
2025-09-12 08:48:07,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [500.59915, 491.78583, 103.315575, 559.37067, 1218.4049, 571.428, 595.35974, 720.00836, 1127.1926, 433.03247]
2025-09-12 08:48:07,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 213.0, 97.0, 220.0, 442.0, 204.0, 304.0, 297.0, 399.0, 199.0]
2025-09-12 08:48:07,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 16 minutes, 2 seconds)
2025-09-12 09:00:23,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:00:23,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:01:30,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 608.24695 ± 268.180
2025-09-12 09:01:30,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1246.9155, 371.85956, 492.25406, 450.45712, 301.84683, 584.34973, 465.38654, 914.60364, 565.652, 689.1452]
2025-09-12 09:01:30,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [426.0, 162.0, 188.0, 193.0, 133.0, 216.0, 203.0, 289.0, 210.0, 291.0]
2025-09-12 09:01:30,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 11 minutes, 3 seconds)
2025-09-12 09:13:38,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:13:38,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:14:42,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 567.43719 ± 163.011
2025-09-12 09:14:42,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [680.26294, 587.7269, 786.945, 537.2612, 296.55258, 382.43158, 598.4256, 823.29047, 396.66638, 584.8091]
2025-09-12 09:14:42,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 223.0, 270.0, 222.0, 125.0, 205.0, 217.0, 283.0, 173.0, 249.0]
2025-09-12 09:14:42,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 57 minutes, 50 seconds)
2025-09-12 09:26:58,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:26:58,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:28:10,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 625.71680 ± 112.690
2025-09-12 09:28:10,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [764.0527, 663.0102, 480.9048, 453.22818, 616.046, 621.85565, 713.506, 620.5117, 511.591, 812.46216]
2025-09-12 09:28:10,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [297.0, 262.0, 200.0, 184.0, 229.0, 240.0, 310.0, 229.0, 230.0, 307.0]
2025-09-12 09:28:10,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 48 minutes, 1 second)
2025-09-12 09:40:04,971 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:40:04,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:41:47,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1017.54724 ± 243.078
2025-09-12 09:41:47,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [981.1035, 1209.0764, 1256.033, 1392.0499, 689.882, 831.43536, 727.04443, 1308.9093, 967.07477, 812.86475]
2025-09-12 09:41:47,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [329.0, 393.0, 444.0, 504.0, 256.0, 280.0, 258.0, 490.0, 327.0, 312.0]
2025-09-12 09:41:47,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (1017.55) for latency MM1Queue_a033_s075
2025-09-12 09:41:47,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 36 minutes, 9 seconds)
2025-09-12 09:53:39,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:53:39,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 09:55:00,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 798.55609 ± 216.749
2025-09-12 09:55:00,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [709.70636, 1188.9891, 1174.1604, 794.49176, 518.7462, 715.32745, 698.9606, 645.59155, 925.0074, 614.58026]
2025-09-12 09:55:00,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 413.0, 379.0, 299.0, 191.0, 264.0, 257.0, 234.0, 317.0, 243.0]
2025-09-12 09:55:00,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 15 hours, 22 minutes, 51 seconds)
2025-09-12 10:07:14,688 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:07:14,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:08:32,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 726.81824 ± 168.695
2025-09-12 10:08:32,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [836.00006, 375.6769, 932.0214, 611.2734, 795.9247, 816.9318, 837.9056, 803.575, 777.43256, 481.44138]
2025-09-12 10:08:32,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [274.0, 153.0, 321.0, 240.0, 308.0, 300.0, 303.0, 311.0, 303.0, 196.0]
2025-09-12 10:08:32,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 15 hours, 11 minutes, 42 seconds)
2025-09-12 10:20:37,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:20:37,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:22:01,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 823.75818 ± 151.936
2025-09-12 10:22:01,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [819.40533, 1084.3605, 1066.5076, 817.89655, 544.46576, 772.5801, 884.98395, 705.733, 775.63556, 766.01404]
2025-09-12 10:22:01,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [303.0, 350.0, 343.0, 273.0, 215.0, 283.0, 302.0, 263.0, 281.0, 288.0]
2025-09-12 10:22:01,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 15 hours, 1 minute, 57 seconds)
2025-09-12 10:34:04,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:34:04,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:35:52,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1099.32837 ± 445.355
2025-09-12 10:35:52,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1134.8483, 790.5642, 994.84845, 1657.8414, 869.81885, 2158.115, 1020.0801, 936.83417, 891.8216, 538.5103]
2025-09-12 10:35:52,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [397.0, 276.0, 337.0, 528.0, 316.0, 707.0, 345.0, 332.0, 321.0, 215.0]
2025-09-12 10:35:52,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (1099.33) for latency MM1Queue_a033_s075
2025-09-12 10:35:52,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 53 minutes, 36 seconds)
2025-09-12 10:48:17,561 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:48:17,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 10:49:36,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 802.00647 ± 218.075
2025-09-12 10:49:36,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [295.3142, 953.8644, 905.5446, 1171.8759, 753.9112, 655.6262, 798.1914, 855.99084, 917.943, 711.8026]
2025-09-12 10:49:36,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [132.0, 306.0, 340.0, 405.0, 233.0, 234.0, 261.0, 271.0, 316.0, 242.0]
2025-09-12 10:49:36,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 41 minutes, 38 seconds)
2025-09-12 11:01:28,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:01:28,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:02:58,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 939.91913 ± 190.421
2025-09-12 11:02:58,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [973.57074, 963.34143, 838.8347, 1054.8663, 579.58167, 1005.249, 1322.2443, 989.3559, 964.9037, 707.2444]
2025-09-12 11:02:58,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [315.0, 325.0, 266.0, 347.0, 216.0, 334.0, 430.0, 320.0, 322.0, 235.0]
2025-09-12 11:02:58,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 29 minutes, 57 seconds)
2025-09-12 11:14:56,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:14:56,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:16:04,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 671.62903 ± 200.384
2025-09-12 11:16:04,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [785.6284, 362.6242, 517.3163, 866.91516, 387.3535, 481.53064, 836.69916, 913.51636, 738.3012, 826.4053]
2025-09-12 11:16:04,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 149.0, 186.0, 271.0, 164.0, 188.0, 301.0, 303.0, 264.0, 270.0]
2025-09-12 11:16:04,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 10 minutes, 55 seconds)
2025-09-12 11:28:12,019 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:28:12,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:29:39,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 907.16730 ± 278.169
2025-09-12 11:29:39,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [791.5614, 1236.0436, 914.96045, 942.1037, 1406.1626, 789.6661, 1065.6298, 556.6251, 418.95828, 949.9615]
2025-09-12 11:29:39,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 395.0, 332.0, 293.0, 458.0, 252.0, 344.0, 198.0, 150.0, 314.0]
2025-09-12 11:29:39,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 58 minutes, 34 seconds)
2025-09-12 11:42:01,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:42:01,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:43:34,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 927.72937 ± 407.518
2025-09-12 11:43:34,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1205.4816, 1119.577, 1000.94415, 413.48853, 791.3516, 361.99753, 380.94708, 1578.4259, 1376.3878, 1048.6927]
2025-09-12 11:43:34,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [397.0, 374.0, 325.0, 175.0, 266.0, 150.0, 149.0, 542.0, 509.0, 376.0]
2025-09-12 11:43:34,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 45 minutes, 54 seconds)
2025-09-12 11:56:10,568 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:56:10,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 11:59:45,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2255.69751 ± 953.308
2025-09-12 11:59:45,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [407.51266, 2974.1694, 2631.8967, 1124.7006, 1051.5472, 2342.9436, 2879.8835, 3063.9534, 2995.553, 3084.8145]
2025-09-12 11:59:45,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 1000.0, 831.0, 417.0, 381.0, 742.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:59:45,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (2255.70) for latency MM1Queue_a033_s075
2025-09-12 11:59:45,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 1 minute, 52 seconds)
2025-09-12 12:11:51,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:11:51,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:14:24,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1697.32593 ± 923.578
2025-09-12 12:14:24,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1350.7052, 3306.7632, 1588.8209, 889.1586, 2494.0032, 3289.408, 1282.1464, 811.305, 985.1293, 975.8208]
2025-09-12 12:14:24,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [406.0, 1000.0, 521.0, 295.0, 738.0, 1000.0, 393.0, 263.0, 326.0, 314.0]
2025-09-12 12:14:24,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 2 minutes, 51 seconds)
2025-09-12 12:25:51,459 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:25:51,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:29:18,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2244.06689 ± 960.674
2025-09-12 12:29:18,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [400.67715, 3149.6504, 1763.9849, 2635.3115, 1361.3441, 1057.0222, 3158.04, 2868.731, 3016.968, 3028.9382]
2025-09-12 12:29:18,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 1000.0, 624.0, 772.0, 436.0, 333.0, 1000.0, 862.0, 1000.0, 936.0]
2025-09-12 12:29:18,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 9 minutes, 26 seconds)
2025-09-12 12:41:12,163 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:41:12,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 12:44:55,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2531.80786 ± 890.090
2025-09-12 12:44:55,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3284.462, 3200.6836, 665.2058, 2040.685, 2915.409, 1486.8018, 1947.6621, 3214.3677, 3362.737, 3200.0637]
2025-09-12 12:44:55,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 239.0, 662.0, 829.0, 443.0, 561.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:44:55,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (2531.81) for latency MM1Queue_a033_s075
2025-09-12 12:44:55,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 18 minutes, 11 seconds)
2025-09-12 12:57:18,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:57:18,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:01:18,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2514.33643 ± 929.739
2025-09-12 13:01:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3069.669, 3051.7627, 3091.6484, 2916.3008, 2154.0066, 3057.9966, 3126.5164, 3119.718, 356.69223, 1199.0544]
2025-09-12 13:01:18,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 684.0, 1000.0, 1000.0, 1000.0, 160.0, 480.0]
2025-09-12 13:01:18,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 30 minutes, 39 seconds)
2025-09-12 13:13:28,049 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:13:28,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:17:55,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2923.92188 ± 445.520
2025-09-12 13:17:55,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3276.833, 3253.8726, 3186.0588, 2778.6182, 3047.4048, 3076.7893, 3083.5732, 2581.3616, 3212.654, 1742.0522]
2025-09-12 13:17:55,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 833.0, 1000.0, 1000.0, 1000.0, 807.0, 1000.0, 563.0]
2025-09-12 13:17:55,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (2923.92) for latency MM1Queue_a033_s075
2025-09-12 13:17:55,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 19 minutes, 50 seconds)
2025-09-12 13:30:35,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:30:35,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:34:01,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2282.38208 ± 1166.678
2025-09-12 13:34:01,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3275.073, 3466.8208, 285.6048, 1717.1251, 2007.2792, 239.34073, 3220.1848, 3349.677, 2181.606, 3081.1084]
2025-09-12 13:34:01,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 124.0, 574.0, 650.0, 117.0, 1000.0, 1000.0, 658.0, 1000.0]
2025-09-12 13:34:01,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 19 minutes, 53 seconds)
2025-09-12 13:46:06,625 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:46:06,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 13:49:47,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2501.63257 ± 1080.933
2025-09-12 13:49:47,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1173.4015, 3148.4778, 3327.0847, 180.53711, 3340.214, 3326.8286, 1609.974, 2342.1853, 3236.6917, 3330.9321]
2025-09-12 13:49:47,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [391.0, 1000.0, 1000.0, 95.0, 1000.0, 1000.0, 482.0, 671.0, 1000.0, 1000.0]
2025-09-12 13:49:47,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 13 minutes, 12 seconds)
2025-09-12 14:01:10,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:01:10,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:05:34,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2864.04907 ± 856.510
2025-09-12 14:05:34,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3095.1863, 3167.1274, 3129.3096, 299.75513, 3083.4253, 3254.5496, 3224.889, 3079.0732, 3161.3765, 3145.7986]
2025-09-12 14:05:34,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 142.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:05:34,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 58 minutes, 40 seconds)
2025-09-12 14:17:38,899 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:17:38,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:22:01,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3068.17407 ± 844.044
2025-09-12 14:22:01,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [559.7864, 3273.8655, 3225.4836, 3224.3352, 3365.0647, 3176.0803, 3383.4114, 3452.426, 3544.4587, 3476.8281]
2025-09-12 14:22:01,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 1000.0, 1000.0, 1000.0, 1000.0, 942.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:22:01,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3068.17) for latency MM1Queue_a033_s075
2025-09-12 14:22:01,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 43 minutes, 22 seconds)
2025-09-12 14:34:08,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:34:08,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:38:11,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2679.77539 ± 944.805
2025-09-12 14:38:11,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3260.277, 3117.991, 3095.858, 3401.3186, 2355.0598, 3218.3225, 3286.4553, 256.2888, 3079.0627, 1727.12]
2025-09-12 14:38:11,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 742.0, 1000.0, 1000.0, 125.0, 1000.0, 524.0]
2025-09-12 14:38:11,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 22 minutes, 31 seconds)
2025-09-12 14:50:19,757 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:50:19,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 14:54:19,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2767.51123 ± 974.602
2025-09-12 14:54:19,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3136.5417, 3339.9504, 111.47121, 2699.7395, 3343.7695, 2202.7195, 3513.8303, 3406.3704, 2582.9316, 3337.7869]
2025-09-12 14:54:19,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 85.0, 778.0, 1000.0, 635.0, 1000.0, 1000.0, 765.0, 1000.0]
2025-09-12 14:54:20,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 7 minutes, 4 seconds)
2025-09-12 15:06:27,970 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:06:27,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:10:46,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2867.56323 ± 743.054
2025-09-12 15:10:46,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1186.7432, 1605.2076, 3244.9075, 3122.3118, 3276.731, 3268.2383, 3224.5115, 3205.313, 3288.1772, 3253.4927]
2025-09-12 15:10:46,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [389.0, 510.0, 1000.0, 1000.0, 1000.0, 1000.0, 997.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:10:46,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 57 minutes, 22 seconds)
2025-09-12 15:23:24,645 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:23:24,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:28:01,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3131.22803 ± 352.119
2025-09-12 15:28:01,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3261.8823, 3334.605, 3337.8113, 2540.3645, 2342.943, 3346.085, 3417.5764, 3212.54, 3278.1392, 3240.3318]
2025-09-12 15:28:01,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 770.0, 722.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:28:01,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3131.23) for latency MM1Queue_a033_s075
2025-09-12 15:28:01,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 55 minutes, 3 seconds)
2025-09-12 15:40:20,911 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:40:20,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 15:44:50,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3028.50269 ± 745.726
2025-09-12 15:44:50,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3369.0254, 3265.069, 3400.81, 3240.3503, 3140.0112, 817.79645, 3039.4114, 3451.042, 3274.1746, 3287.3386]
2025-09-12 15:44:50,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 272.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:44:50,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 41 minutes, 54 seconds)
2025-09-12 15:57:05,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:57:05,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:00:45,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2430.71191 ± 1326.013
2025-09-12 16:00:45,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [194.87032, 833.15326, 3324.7449, 3176.3584, 3324.6294, 3337.1614, 236.9941, 3371.4321, 3213.25, 3294.5261]
2025-09-12 16:00:45,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [95.0, 303.0, 1000.0, 1000.0, 1000.0, 1000.0, 116.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:00:45,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 23 minutes, 6 seconds)
2025-09-12 16:12:52,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:12:52,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:17:34,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3087.11548 ± 322.808
2025-09-12 16:17:34,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2161.2146, 3306.7854, 3080.2078, 3192.4377, 3261.1826, 3237.0286, 3271.3254, 3123.4363, 3251.6772, 2985.8594]
2025-09-12 16:17:34,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [693.0, 1000.0, 1000.0, 1000.0, 1000.0, 993.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:17:34,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 12 minutes, 34 seconds)
2025-09-12 16:29:17,539 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:29:17,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:31:45,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 1789.92554 ± 1094.450
2025-09-12 16:31:45,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1772.5118, 2143.041, 1682.7444, 3723.6714, 3735.391, 357.88544, 790.52026, 961.6576, 1699.3636, 1032.4696]
2025-09-12 16:31:45,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [492.0, 590.0, 488.0, 977.0, 1000.0, 160.0, 272.0, 306.0, 485.0, 315.0]
2025-09-12 16:31:45,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 36 minutes, 26 seconds)
2025-09-12 16:44:30,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:44:30,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 16:49:21,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3415.05811 ± 67.294
2025-09-12 16:49:21,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3440.066, 3407.503, 3382.3574, 3261.5674, 3449.2522, 3343.9077, 3508.3865, 3429.1702, 3465.933, 3462.4363]
2025-09-12 16:49:21,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 993.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:49:21,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3415.06) for latency MM1Queue_a033_s075
2025-09-12 16:49:21,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 23 minutes, 9 seconds)
2025-09-12 17:01:25,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:01:25,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:06:12,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3303.11377 ± 214.957
2025-09-12 17:06:12,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3438.4226, 3428.1553, 3297.4526, 3371.0405, 3245.0344, 3266.0923, 2695.3782, 3401.8623, 3447.34, 3440.359]
2025-09-12 17:06:12,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 818.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:06:12,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 7 minutes, 6 seconds)
2025-09-12 17:18:20,825 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:18:20,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:23:10,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3259.00049 ± 87.909
2025-09-12 17:23:10,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3352.5894, 3091.731, 3301.4673, 3196.337, 3275.647, 3178.4973, 3415.7703, 3267.0815, 3214.296, 3296.5886]
2025-09-12 17:23:10,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:23:10,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 59 minutes, 21 seconds)
2025-09-12 17:34:21,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:34:21,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:39:12,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3336.52466 ± 58.775
2025-09-12 17:39:12,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3355.8735, 3344.7979, 3384.5977, 3359.8677, 3202.3242, 3316.9685, 3327.9902, 3272.6223, 3417.6172, 3382.5876]
2025-09-12 17:39:12,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:39:12,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 36 minutes, 39 seconds)
2025-09-12 17:51:45,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:51:45,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 17:56:10,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3142.19629 ± 934.531
2025-09-12 17:56:10,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3374.9963, 3405.181, 3511.9175, 3591.8708, 3533.5872, 3359.2056, 3410.7266, 3631.2495, 3244.759, 358.46933]
2025-09-12 17:56:10,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 156.0]
2025-09-12 17:56:10,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 41 minutes, 34 seconds)
2025-09-12 18:08:38,215 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:08:38,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:13:28,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3397.04736 ± 47.295
2025-09-12 18:13:28,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3373.3682, 3309.6301, 3418.6802, 3405.7393, 3349.5947, 3402.9487, 3360.6646, 3421.2031, 3450.5977, 3478.0469]
2025-09-12 18:13:28,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:13:28,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 22 minutes, 28 seconds)
2025-09-12 18:25:35,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:25:35,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:29:19,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2642.91650 ± 1277.555
2025-09-12 18:29:19,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3467.386, 3541.925, 3432.9692, 121.32058, 3571.7588, 2981.7102, 2336.812, 3428.4146, 3298.336, 248.5334]
2025-09-12 18:29:19,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 135.0, 1000.0, 835.0, 690.0, 1000.0, 1000.0, 116.0]
2025-09-12 18:29:19,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 58 minutes, 27 seconds)
2025-09-12 18:41:25,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:41:25,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 18:45:32,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2965.91553 ± 1100.038
2025-09-12 18:45:32,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3505.332, 1549.5085, 3434.0603, 3505.2876, 3632.1084, 3453.4749, 3489.36, 3510.1616, 165.74368, 3414.1199]
2025-09-12 18:45:32,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 462.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 91.0, 1000.0]
2025-09-12 18:45:32,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 36 minutes, 35 seconds)
2025-09-12 18:57:45,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:57:45,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:02:36,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3409.21753 ± 57.455
2025-09-12 19:02:36,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3445.4712, 3490.2744, 3396.0088, 3446.7773, 3320.8503, 3342.2139, 3456.441, 3329.772, 3460.8738, 3403.4944]
2025-09-12 19:02:36,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:02:36,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 27 minutes, 7 seconds)
2025-09-12 19:14:45,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:14:45,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:19:07,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3046.29541 ± 695.554
2025-09-12 19:19:07,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3442.6338, 3329.239, 1155.8417, 3382.137, 3331.7908, 3424.355, 3295.7505, 3431.2947, 2393.3748, 3276.5342]
2025-09-12 19:19:07,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 355.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 708.0, 965.0]
2025-09-12 19:19:07,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 7 minutes, 27 seconds)
2025-09-12 19:30:36,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:30:36,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:34:37,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 2889.20215 ± 877.944
2025-09-12 19:34:37,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3484.7793, 3501.9492, 3489.511, 3604.0737, 3364.7876, 2916.0593, 1075.0825, 2692.806, 3390.0034, 1372.9684]
2025-09-12 19:34:37,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 863.0, 335.0, 749.0, 1000.0, 440.0]
2025-09-12 19:34:37,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 39 minutes, 21 seconds)
2025-09-12 19:47:04,243 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:47:04,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 19:51:26,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3071.94165 ± 837.077
2025-09-12 19:51:26,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3479.544, 3499.561, 3304.4846, 643.23096, 3401.918, 3407.793, 3398.2302, 3505.0664, 2733.6343, 3345.9531]
2025-09-12 19:51:26,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 227.0, 1000.0, 1000.0, 1000.0, 1000.0, 818.0, 1000.0]
2025-09-12 19:51:26,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 29 minutes, 11 seconds)
2025-09-12 20:03:15,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:03:15,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:07:34,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3205.61011 ± 708.020
2025-09-12 20:07:34,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3569.8892, 3144.7087, 3545.4307, 3450.414, 3573.4822, 2959.6125, 1168.2019, 3575.7976, 3510.3406, 3558.2268]
2025-09-12 20:07:34,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 870.0, 1000.0, 946.0, 1000.0, 829.0, 360.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:07:34,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 12 minutes, 10 seconds)
2025-09-12 20:19:53,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:19:53,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:24:32,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3388.32300 ± 350.660
2025-09-12 20:24:32,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3492.274, 3491.789, 3516.3882, 3566.9563, 3446.3076, 2346.4287, 3538.2832, 3474.9458, 3587.7344, 3422.1223]
2025-09-12 20:24:32,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 655.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:24:32,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 55 minutes, 17 seconds)
2025-09-12 20:36:51,568 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:36:51,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:41:40,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3506.67310 ± 57.443
2025-09-12 20:41:40,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3490.8535, 3580.6057, 3422.9297, 3541.4502, 3540.4517, 3543.7056, 3416.4165, 3462.9888, 3583.837, 3483.4927]
2025-09-12 20:41:40,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:41:40,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3506.67) for latency MM1Queue_a033_s075
2025-09-12 20:41:40,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 42 minutes, 17 seconds)
2025-09-12 20:53:50,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:53:50,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 20:58:39,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3410.55615 ± 69.989
2025-09-12 20:58:39,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3552.386, 3443.44, 3323.3699, 3499.3694, 3354.4094, 3393.2773, 3352.9307, 3446.6348, 3391.252, 3348.494]
2025-09-12 20:58:39,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:58:39,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 33 minutes, 49 seconds)
2025-09-12 21:10:50,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:10:50,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:15:16,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3186.77319 ± 990.829
2025-09-12 21:15:17,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3463.6985, 3600.3647, 3669.7747, 226.46315, 3451.9985, 3498.5251, 3537.9224, 3340.5645, 3610.07, 3468.3518]
2025-09-12 21:15:17,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 112.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:15:17,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 15 minutes, 56 seconds)
2025-09-12 21:27:26,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:27:26,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:32:18,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3507.06641 ± 68.470
2025-09-12 21:32:18,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3499.7659, 3462.89, 3409.6785, 3403.1143, 3579.8098, 3599.8792, 3451.8115, 3553.895, 3575.6018, 3534.219]
2025-09-12 21:32:18,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:32:18,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3507.07) for latency MM1Queue_a033_s075
2025-09-12 21:32:18,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 3 minutes, 40 seconds)
2025-09-12 21:44:29,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:44:29,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 21:49:17,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3445.14136 ± 65.194
2025-09-12 21:49:17,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3523.7197, 3501.1948, 3447.105, 3509.7295, 3466.2012, 3425.8665, 3411.1858, 3366.682, 3308.2498, 3491.4824]
2025-09-12 21:49:17,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:49:17,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 46 minutes, 43 seconds)
2025-09-12 22:01:24,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:01:24,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:06:13,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3581.61328 ± 73.034
2025-09-12 22:06:13,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3578.7483, 3510.6514, 3660.224, 3647.5872, 3577.8887, 3641.5195, 3616.6125, 3401.9055, 3574.945, 3606.0488]
2025-09-12 22:06:13,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:06:13,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3581.61) for latency MM1Queue_a033_s075
2025-09-12 22:06:13,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 28 minutes, 58 seconds)
2025-09-12 22:18:49,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:18:49,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:23:39,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3604.65088 ± 62.205
2025-09-12 22:23:39,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3684.236, 3579.0122, 3668.7725, 3602.943, 3651.6562, 3666.7666, 3480.8147, 3580.5972, 3529.817, 3601.893]
2025-09-12 22:23:39,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:23:39,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3604.65) for latency MM1Queue_a033_s075
2025-09-12 22:23:39,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 13 minutes, 59 seconds)
2025-09-12 22:35:50,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:35:50,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:40:22,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3418.83472 ± 527.418
2025-09-12 22:40:22,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3620.8489, 3749.926, 1851.8273, 3636.7358, 3503.4626, 3621.4648, 3526.7886, 3549.3342, 3631.924, 3496.0364]
2025-09-12 22:40:22,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 516.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:40:22,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 57 minutes, 25 seconds)
2025-09-12 22:52:32,113 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:52:32,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 22:57:11,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3455.58008 ± 464.780
2025-09-12 22:57:11,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3594.0923, 3626.6748, 3621.0762, 3569.0513, 3626.1094, 3606.5027, 3755.1248, 3601.0894, 3481.8142, 2074.2646]
2025-09-12 22:57:11,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 589.0]
2025-09-12 22:57:11,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 39 minutes, 30 seconds)
2025-09-12 23:08:44,092 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:08:44,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:13:04,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3175.68921 ± 918.204
2025-09-12 23:13:04,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [654.58014, 3559.3906, 3654.0889, 3577.7217, 3596.7258, 3562.1794, 3615.65, 3515.9827, 2359.6895, 3660.8833]
2025-09-12 23:13:04,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [224.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 674.0, 1000.0]
2025-09-12 23:13:04,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 18 minutes, 22 seconds)
2025-09-12 23:24:56,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:24:56,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:29:45,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3597.44580 ± 123.684
2025-09-12 23:29:45,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3648.4148, 3633.1775, 3566.7334, 3635.4263, 3243.1072, 3577.153, 3674.8975, 3671.6848, 3640.4653, 3683.397]
2025-09-12 23:29:45,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 913.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:29:45,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 42 seconds)
2025-09-12 23:41:59,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:41:59,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-12 23:46:47,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3649.93237 ± 61.280
2025-09-12 23:46:47,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3673.4375, 3631.0457, 3733.3494, 3701.047, 3632.3354, 3538.2744, 3683.028, 3549.3723, 3705.4539, 3651.9807]
2025-09-12 23:46:47,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 23:46:47,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3649.93) for latency MM1Queue_a033_s075
2025-09-12 23:46:47,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 42 minutes, 39 seconds)
2025-09-12 23:59:06,388 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:59:06,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:03:49,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3525.49170 ± 229.635
2025-09-13 00:03:49,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3427.59, 3572.434, 3719.8455, 3665.5798, 3578.0703, 3598.8772, 2871.013, 3579.3088, 3605.8252, 3636.3748]
2025-09-13 00:03:49,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 768.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:03:49,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 27 minutes)
2025-09-13 00:15:58,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:15:58,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:20:22,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3343.71680 ± 783.912
2025-09-13 00:20:22,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3629.9905, 3772.0947, 3763.5376, 1290.2484, 3716.8623, 3673.56, 3744.6807, 2445.7903, 3606.6667, 3793.7366]
2025-09-13 00:20:22,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [957.0, 1000.0, 1000.0, 388.0, 1000.0, 1000.0, 1000.0, 646.0, 1000.0, 1000.0]
2025-09-13 00:20:22,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 9 minutes, 34 seconds)
2025-09-13 00:32:56,478 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:32:56,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:37:28,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3463.49023 ± 777.598
2025-09-13 00:37:28,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3787.6465, 3716.3098, 3794.089, 3701.3413, 3766.8062, 3680.6333, 3657.3247, 1135.566, 3753.0994, 3642.0854]
2025-09-13 00:37:28,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 338.0, 1000.0, 1000.0]
2025-09-13 00:37:28,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 56 minutes, 20 seconds)
2025-09-13 00:49:31,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:49:31,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 00:54:21,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3712.05396 ± 65.968
2025-09-13 00:54:21,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3706.9102, 3615.9097, 3707.0417, 3747.0918, 3730.2986, 3625.7385, 3825.7944, 3676.9487, 3674.5784, 3810.226]
2025-09-13 00:54:21,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 00:54:21,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3712.05) for latency MM1Queue_a033_s075
2025-09-13 00:54:21,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 39 minutes, 56 seconds)
2025-09-13 01:06:29,918 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:06:29,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:11:15,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3620.23877 ± 178.356
2025-09-13 01:11:15,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3734.886, 3724.4106, 3673.244, 3626.4075, 3629.09, 3630.892, 3691.34, 3099.5244, 3736.9404, 3655.6528]
2025-09-13 01:11:15,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 984.0, 1000.0, 1000.0, 839.0, 1000.0, 1000.0]
2025-09-13 01:11:15,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 22 minutes, 43 seconds)
2025-09-13 01:23:02,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:23:02,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:27:48,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3730.42505 ± 102.399
2025-09-13 01:27:48,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3838.2104, 3618.0232, 3774.1829, 3807.5066, 3751.744, 3743.0486, 3863.1235, 3674.4592, 3730.2297, 3503.7239]
2025-09-13 01:27:48,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 958.0]
2025-09-13 01:27:48,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3730.43) for latency MM1Queue_a033_s075
2025-09-13 01:27:48,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 4 minutes, 46 seconds)
2025-09-13 01:39:47,982 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:39:48,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 01:44:33,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3766.34717 ± 38.449
2025-09-13 01:44:33,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3751.88, 3729.2017, 3771.0383, 3710.6013, 3735.3342, 3795.8591, 3813.864, 3728.3652, 3818.8489, 3808.4807]
2025-09-13 01:44:33,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 01:44:33,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3766.35) for latency MM1Queue_a033_s075
2025-09-13 01:44:33,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 48 minutes, 22 seconds)
2025-09-13 01:56:39,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:56:39,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:00:42,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3059.97949 ± 1222.632
2025-09-13 02:00:42,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3688.135, 3572.8044, 3647.6003, 3647.9312, 3620.2715, 348.17404, 912.6969, 3665.551, 3806.5205, 3690.1118]
2025-09-13 02:00:42,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 146.0, 297.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:00:42,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 29 minutes, 49 seconds)
2025-09-13 02:13:56,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:13:56,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:18:47,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3613.11060 ± 40.730
2025-09-13 02:18:47,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3590.6355, 3630.029, 3547.5938, 3590.1719, 3698.819, 3586.5532, 3600.2844, 3646.9512, 3592.4453, 3647.6235]
2025-09-13 02:18:47,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:18:47,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 15 minutes, 5 seconds)
2025-09-13 02:30:52,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:30:52,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:35:40,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3734.38818 ± 32.972
2025-09-13 02:35:40,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3702.003, 3740.9978, 3753.7915, 3703.246, 3695.322, 3784.0637, 3776.947, 3690.6584, 3735.7554, 3761.098]
2025-09-13 02:35:40,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:35:40,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 58 minutes, 10 seconds)
2025-09-13 02:47:51,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:47:51,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 02:52:43,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3687.42700 ± 61.024
2025-09-13 02:52:43,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3669.9062, 3589.6624, 3659.556, 3685.592, 3696.8333, 3788.028, 3746.927, 3692.3267, 3751.4233, 3594.0142]
2025-09-13 02:52:43,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 02:52:43,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 41 minutes, 53 seconds)
2025-09-13 03:04:55,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:04:55,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:08:47,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3057.25049 ± 1491.482
2025-09-13 03:08:47,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3892.7297, 3860.6497, 3815.0981, 3753.084, 26.14402, 3758.6245, 125.64266, 3811.4412, 3764.7085, 3764.3809]
2025-09-13 03:08:47,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 29.0, 1000.0, 84.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:08:47,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 24 minutes, 13 seconds)
2025-09-13 03:20:57,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:20:57,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:25:48,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3892.52466 ± 52.421
2025-09-13 03:25:48,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3785.402, 3945.739, 3884.4548, 3947.9116, 3854.5598, 3856.898, 3916.6624, 3866.2695, 3897.1294, 3970.2195]
2025-09-13 03:25:48,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:25:48,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1226 [INFO]: New best (3892.52) for latency MM1Queue_a033_s075
2025-09-13 03:25:48,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 8 minutes, 5 seconds)
2025-09-13 03:37:26,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:37:26,625 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:41:50,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3374.37842 ± 1061.688
2025-09-13 03:41:50,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3737.6643, 3809.9666, 194.21666, 3707.4507, 3670.8757, 3732.1006, 3814.014, 3654.0964, 3638.3447, 3785.0564]
2025-09-13 03:41:50,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 106.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:41:50,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 49 minutes, 49 seconds)
2025-09-13 03:54:00,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:54:00,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 03:58:49,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3733.92456 ± 67.292
2025-09-13 03:58:49,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3604.2573, 3687.0913, 3704.0159, 3704.5383, 3759.247, 3740.3904, 3684.6877, 3822.1655, 3818.626, 3814.228]
2025-09-13 03:58:49,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 03:58:49,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 15 seconds)
2025-09-13 04:11:00,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:11:00,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:15:46,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3823.76099 ± 57.060
2025-09-13 04:15:46,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3765.992, 3813.9714, 3814.6611, 3806.5273, 3713.0479, 3844.3157, 3928.717, 3872.2317, 3805.6687, 3872.4785]
2025-09-13 04:15:46,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:15:46,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 36 seconds)
2025-09-13 04:27:40,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:27:40,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-09-13 04:32:25,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 3658.73511 ± 228.567
2025-09-13 04:32:25,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3798.3865, 3773.0654, 2989.9563, 3663.8735, 3639.9976, 3686.1501, 3747.2827, 3767.4297, 3783.538, 3737.672]
2025-09-13 04:32:25,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 801.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 04:32:25,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-walker2d):1251 [DEBUG]: Training session finished
