2025-05-09 09:43:39,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-05-09 09:43:39,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay
2025-05-09 09:43:39,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x147919f95350>}
2025-05-09 09:43:39,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1111 [DEBUG]: using device: cuda
2025-05-09 09:43:39,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1133 [INFO]: Creating new trainer
2025-05-09 09:43:39,777 baseline-mbpac-noisy-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-09 09:43:39,778 baseline-mbpac-noisy-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 09:43:39,787 baseline-mbpac-noisy-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-09 09:43:40,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1194 [DEBUG]: Starting training session...
2025-05-09 09:43:40,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 1/100
2025-05-09 09:52:28,081 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 09:52:28,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:52:36,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 65.57805 ± 33.738
2025-05-09 09:52:36,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [86.63512, 47.55293, 38.329086, 85.32771, 112.32798, 63.551426, 122.81454, 39.786793, 49.481518, 9.97342]
2025-05-09 09:52:36,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 27.0, 22.0, 46.0, 60.0, 35.0, 65.0, 23.0, 28.0, 10.0]
2025-05-09 09:52:36,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (65.58) for latency MM1Queue_a033_s075
2025-05-09 09:52:36,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 09:52:36,998 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:52:37,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 14 hours, 45 minutes)
2025-05-09 10:02:43,113 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:02:43,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:03:39,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 240.51338 ± 318.185
2025-05-09 10:03:39,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [36.948788, 666.882, 136.06384, 33.860207, 35.67576, 74.91602, 244.50677, 57.86481, 95.697105, 1022.71844]
2025-05-09 10:03:39,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [37.0, 638.0, 127.0, 35.0, 38.0, 70.0, 207.0, 54.0, 85.0, 1000.0]
2025-05-09 10:03:39,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (240.51) for latency MM1Queue_a033_s075
2025-05-09 10:03:39,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 10:03:39,104 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:03:39,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 16 hours, 18 minutes, 45 seconds)
2025-05-09 10:13:09,061 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:13:09,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:13:34,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 177.30492 ± 103.358
2025-05-09 10:13:34,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [26.444046, 86.605644, 96.08647, 162.93498, 174.39742, 268.48688, 382.77106, 84.540016, 227.70929, 263.07336]
2025-05-09 10:13:34,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 51.0, 62.0, 95.0, 99.0, 184.0, 181.0, 56.0, 146.0, 125.0]
2025-05-09 10:13:34,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 16 hours, 6 minutes, 39 seconds)
2025-05-09 10:23:05,708 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:23:05,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:23:39,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 247.63583 ± 147.499
2025-05-09 10:23:39,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [331.8132, 187.67035, 102.863594, 293.34418, 298.11917, 378.40268, 305.53717, 58.85418, 509.7673, 9.98648]
2025-05-09 10:23:39,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [251.0, 129.0, 73.0, 119.0, 131.0, 175.0, 131.0, 38.0, 275.0, 40.0]
2025-05-09 10:23:39,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (247.64) for latency MM1Queue_a033_s075
2025-05-09 10:23:39,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 10:23:39,464 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:23:39,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 15 hours, 59 minutes, 32 seconds)
2025-05-09 10:33:28,338 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:33:28,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:34:32,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 219.24585 ± 207.374
2025-05-09 10:34:32,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [78.109436, 646.3494, 15.611633, 175.14848, 146.94092, 503.96497, 391.122, 77.998634, 11.718561, 145.49458]
2025-05-09 10:34:32,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 1000.0, 25.0, 233.0, 163.0, 745.0, 322.0, 69.0, 19.0, 186.0]
2025-05-09 10:34:32,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 16 hours, 6 minutes, 33 seconds)
2025-05-09 10:43:55,235 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:43:55,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:44:43,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 308.03915 ± 255.820
2025-05-09 10:44:43,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [295.65894, 315.15704, 95.66102, 160.48749, 176.0275, 332.47906, 188.35184, 301.79276, 173.94278, 1040.8331]
2025-05-09 10:44:43,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 148.0, 58.0, 99.0, 124.0, 160.0, 109.0, 155.0, 98.0, 1000.0]
2025-05-09 10:44:43,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (308.04) for latency MM1Queue_a033_s075
2025-05-09 10:44:43,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 10:44:43,740 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:44:44,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 16 hours, 19 minutes, 53 seconds)
2025-05-09 10:54:33,192 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:54:33,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:54:58,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 236.68469 ± 88.862
2025-05-09 10:54:58,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [102.12957, 304.1147, 314.85162, 333.12668, 183.46202, 219.35721, 312.07047, 304.99756, 220.88383, 71.85323]
2025-05-09 10:54:58,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 131.0, 138.0, 140.0, 87.0, 116.0, 130.0, 135.0, 100.0, 42.0]
2025-05-09 10:54:59,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 15 hours, 54 minutes, 53 seconds)
2025-05-09 11:04:45,302 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:04:45,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:05:20,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 350.47232 ± 78.930
2025-05-09 11:05:20,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [221.95038, 327.6407, 350.45743, 439.61063, 330.41202, 254.37161, 338.93024, 335.8813, 509.37033, 396.0983]
2025-05-09 11:05:20,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 142.0, 146.0, 195.0, 136.0, 158.0, 131.0, 134.0, 211.0, 159.0]
2025-05-09 11:05:20,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (350.47) for latency MM1Queue_a033_s075
2025-05-09 11:05:20,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 11:05:20,672 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:05:20,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 15 hours, 52 minutes, 34 seconds)
2025-05-09 11:14:57,691 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:14:57,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:15:48,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 528.12042 ± 270.421
2025-05-09 11:15:48,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [197.39005, 750.60944, 522.0586, 712.8508, 225.4483, 179.4861, 365.9008, 559.3655, 735.0291, 1033.0654]
2025-05-09 11:15:48,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 286.0, 215.0, 278.0, 113.0, 93.0, 154.0, 204.0, 267.0, 369.0]
2025-05-09 11:15:48,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (528.12) for latency MM1Queue_a033_s075
2025-05-09 11:15:48,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 11:15:48,834 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:15:48,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 15 hours, 49 minutes, 16 seconds)
2025-05-09 11:25:31,301 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:25:31,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:26:10,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 415.55606 ± 365.716
2025-05-09 11:26:10,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [153.85934, 212.27045, 257.35178, 208.88322, 226.94852, 880.3447, 551.0897, 1299.369, 197.07426, 168.36975]
2025-05-09 11:26:10,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 117.0, 131.0, 129.0, 120.0, 270.0, 235.0, 415.0, 99.0, 91.0]
2025-05-09 11:26:10,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 15 hours, 29 minutes, 10 seconds)
2025-05-09 11:35:49,281 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:35:49,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:36:11,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 206.88911 ± 202.152
2025-05-09 11:36:11,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [521.8975, 241.6493, 99.59057, 73.15073, 33.20603, 49.26856, 47.48478, 429.7246, 547.30164, 25.617409]
2025-05-09 11:36:11,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [190.0, 115.0, 57.0, 45.0, 29.0, 33.0, 31.0, 159.0, 204.0, 20.0]
2025-05-09 11:36:11,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 15 hours, 15 minutes, 51 seconds)
2025-05-09 11:45:53,080 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:45:53,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:46:37,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 558.81757 ± 268.352
2025-05-09 11:46:37,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [890.5429, 784.5857, 241.3781, 754.0941, 731.95807, 557.3727, 493.3281, 41.874584, 789.845, 303.19653]
2025-05-09 11:46:37,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [292.0, 269.0, 114.0, 264.0, 246.0, 197.0, 181.0, 33.0, 254.0, 135.0]
2025-05-09 11:46:37,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (558.82) for latency MM1Queue_a033_s075
2025-05-09 11:46:37,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 11:46:37,740 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:46:37,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 15 hours, 8 minutes, 50 seconds)
2025-05-09 11:56:26,276 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:56:26,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:57:23,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 664.01575 ± 440.268
2025-05-09 11:57:23,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [778.9804, 280.32382, 33.30471, 1416.0566, 296.06393, 1232.1718, 766.03754, 791.79407, 142.11983, 903.30505]
2025-05-09 11:57:23,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 129.0, 27.0, 461.0, 136.0, 400.0, 316.0, 337.0, 76.0, 338.0]
2025-05-09 11:57:23,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (664.02) for latency MM1Queue_a033_s075
2025-05-09 11:57:23,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 11:57:23,050 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:57:23,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 5 minutes, 29 seconds)
2025-05-09 12:07:10,512 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:07:10,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:08:16,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 778.99457 ± 667.893
2025-05-09 12:08:17,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [384.3542, 1776.3412, 169.94197, 63.520416, 762.39343, 484.8763, 1321.9673, 2056.8516, 158.69115, 611.00824]
2025-05-09 12:08:17,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [155.0, 588.0, 82.0, 39.0, 285.0, 179.0, 475.0, 649.0, 80.0, 226.0]
2025-05-09 12:08:17,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (778.99) for latency MM1Queue_a033_s075
2025-05-09 12:08:17,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 12:08:17,080 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:08:17,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 2 minutes, 28 seconds)
2025-05-09 12:17:36,767 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:17:36,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:18:24,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 563.45215 ± 244.903
2025-05-09 12:18:24,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [452.70154, 140.03008, 828.8731, 480.16266, 762.7608, 156.13805, 842.0342, 665.58215, 550.42957, 755.8094]
2025-05-09 12:18:24,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 73.0, 298.0, 181.0, 279.0, 76.0, 275.0, 243.0, 203.0, 260.0]
2025-05-09 12:18:24,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 14 hours, 48 minutes, 10 seconds)
2025-05-09 12:27:33,689 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:27:33,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:28:07,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 345.19336 ± 217.923
2025-05-09 12:28:07,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [212.52357, 570.5273, 754.4974, 272.5058, 95.251396, 295.54022, 284.82745, 641.8382, 233.52576, 90.89652]
2025-05-09 12:28:07,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 215.0, 277.0, 138.0, 55.0, 140.0, 135.0, 216.0, 130.0, 53.0]
2025-05-09 12:28:07,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 14 hours, 32 minutes, 36 seconds)
2025-05-09 12:37:20,160 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:37:20,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:38:22,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 809.42188 ± 678.920
2025-05-09 12:38:22,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [922.0457, 274.54105, 1079.5284, 264.3674, 159.31668, 72.77445, 2123.9985, 1641.0988, 252.21336, 1304.3342]
2025-05-09 12:38:22,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [330.0, 132.0, 363.0, 130.0, 85.0, 45.0, 651.0, 525.0, 131.0, 403.0]
2025-05-09 12:38:22,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (809.42) for latency MM1Queue_a033_s075
2025-05-09 12:38:22,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 12:38:22,297 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:38:22,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 14 hours, 18 minutes, 55 seconds)
2025-05-09 12:47:09,344 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:47:09,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:48:29,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1012.27863 ± 931.714
2025-05-09 12:48:29,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1076.6315, 105.13634, 1565.4357, 658.38495, 1028.5433, 3023.5344, 256.379, 37.666508, 237.92673, 2133.148]
2025-05-09 12:48:29,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [392.0, 60.0, 526.0, 243.0, 322.0, 1000.0, 130.0, 29.0, 130.0, 708.0]
2025-05-09 12:48:29,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (1012.28) for latency MM1Queue_a033_s075
2025-05-09 12:48:29,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 12:48:29,592 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:48:29,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 13 hours, 58 minutes, 10 seconds)
2025-05-09 12:57:33,522 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:57:33,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:59:26,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1320.99438 ± 849.532
2025-05-09 12:59:26,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1637.227, 1169.1759, 2465.0288, 172.358, 2266.2588, 2329.7842, 350.67044, 1122.3844, 97.79369, 1599.2623]
2025-05-09 12:59:26,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [550.0, 407.0, 868.0, 88.0, 741.0, 760.0, 151.0, 352.0, 56.0, 575.0]
2025-05-09 12:59:26,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (1320.99) for latency MM1Queue_a033_s075
2025-05-09 12:59:26,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 12:59:26,811 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:59:26,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 13 hours, 48 minutes, 49 seconds)
2025-05-09 13:09:02,844 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:09:03,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:10:00,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 637.20233 ± 421.302
2025-05-09 13:10:00,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [539.7694, 1138.0995, 552.2156, 216.61176, 98.3954, 1302.0112, 168.91533, 348.17477, 1118.7537, 889.0773]
2025-05-09 13:10:00,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 411.0, 218.0, 104.0, 56.0, 449.0, 88.0, 149.0, 387.0, 278.0]
2025-05-09 13:10:00,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 13 hours, 45 minutes, 31 seconds)
2025-05-09 13:19:10,618 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:19:10,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:20:18,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 799.80493 ± 553.809
2025-05-09 13:20:18,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [176.85829, 1611.891, 1595.0718, 895.56287, 226.34575, 557.253, 458.2296, 1534.0023, 715.13086, 227.7043]
2025-05-09 13:20:18,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 543.0, 534.0, 304.0, 106.0, 211.0, 188.0, 558.0, 249.0, 108.0]
2025-05-09 13:20:18,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 13 hours, 44 minutes, 23 seconds)
2025-05-09 13:30:04,975 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:30:05,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:32:20,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1689.34473 ± 1057.115
2025-05-09 13:32:20,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [174.5896, 2701.0366, 1160.7936, 2975.48, 2952.7324, 1488.683, 566.4867, 208.23909, 2284.258, 2381.149]
2025-05-09 13:32:20,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 862.0, 429.0, 1000.0, 1000.0, 508.0, 229.0, 105.0, 724.0, 824.0]
2025-05-09 13:32:20,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (1689.34) for latency MM1Queue_a033_s075
2025-05-09 13:32:20,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 13:32:20,780 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:32:20,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 14 hours, 2 minutes)
2025-05-09 13:41:30,271 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:41:30,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:42:37,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 895.02765 ± 734.179
2025-05-09 13:42:37,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2835.716, 909.75134, 623.43713, 834.6368, 608.70215, 204.70047, 306.74066, 1502.1217, 401.59454, 722.8764]
2025-05-09 13:42:37,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [880.0, 268.0, 223.0, 298.0, 219.0, 95.0, 131.0, 458.0, 154.0, 249.0]
2025-05-09 13:42:37,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 13 hours, 53 minutes, 31 seconds)
2025-05-09 13:51:49,042 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:51:49,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:53:31,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1356.20459 ± 1136.771
2025-05-09 13:53:31,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [167.63953, 3134.2795, 1957.6718, 260.22885, 1317.9493, 2205.929, 115.91303, 1207.9503, 3078.0059, 116.47955]
2025-05-09 13:53:31,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 1000.0, 678.0, 117.0, 464.0, 712.0, 61.0, 411.0, 1000.0, 62.0]
2025-05-09 13:53:31,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 41 minutes, 52 seconds)
2025-05-09 14:01:54,766 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:01:55,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:04:05,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1743.37048 ± 990.716
2025-05-09 14:04:05,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2162.5107, 1546.0298, 248.41269, 3107.0698, 867.059, 1416.4353, 1171.8196, 809.62335, 3105.5847, 2999.159]
2025-05-09 14:04:05,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [724.0, 538.0, 123.0, 1000.0, 258.0, 483.0, 383.0, 293.0, 1000.0, 1000.0]
2025-05-09 14:04:05,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (1743.37) for latency MM1Queue_a033_s075
2025-05-09 14:04:05,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 14:04:05,072 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:04:05,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 31 minutes, 6 seconds)
2025-05-09 14:13:07,222 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:13:07,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:14:25,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1003.90979 ± 805.974
2025-05-09 14:14:25,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [403.6583, 406.74945, 70.83004, 2545.3184, 1036.1702, 1019.64374, 115.510475, 1286.118, 868.97504, 2286.1238]
2025-05-09 14:14:25,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 165.0, 42.0, 851.0, 362.0, 319.0, 61.0, 444.0, 304.0, 772.0]
2025-05-09 14:14:25,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 20 minutes, 48 seconds)
2025-05-09 14:23:11,767 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:23:11,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:24:24,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 950.74493 ± 831.520
2025-05-09 14:24:24,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [161.4615, 3057.0884, 1353.4657, 641.25543, 69.46652, 1099.5009, 92.21607, 1043.0465, 1150.4468, 839.5019]
2025-05-09 14:24:24,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 1000.0, 425.0, 231.0, 45.0, 378.0, 51.0, 332.0, 387.0, 299.0]
2025-05-09 14:24:24,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 12 hours, 40 minutes, 6 seconds)
2025-05-09 14:33:30,569 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:33:30,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:35:18,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1393.56226 ± 793.955
2025-05-09 14:35:18,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [958.9364, 1652.822, 441.6101, 1720.3607, 1710.9827, 1985.8834, 106.691345, 838.82043, 3007.6084, 1511.9071]
2025-05-09 14:35:18,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [336.0, 560.0, 168.0, 562.0, 567.0, 671.0, 57.0, 281.0, 1000.0, 511.0]
2025-05-09 14:35:18,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 12 hours, 38 minutes, 44 seconds)
2025-05-09 14:44:25,472 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:44:25,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:46:14,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1452.12476 ± 1200.172
2025-05-09 14:46:14,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [92.50993, 597.5531, 118.27131, 3084.0706, 1198.6238, 1792.3396, 3113.9558, 3076.1255, 114.3206, 1333.4774]
2025-05-09 14:46:14,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 222.0, 63.0, 1000.0, 410.0, 601.0, 1000.0, 1000.0, 61.0, 450.0]
2025-05-09 14:46:14,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 12 hours, 28 minutes, 36 seconds)
2025-05-09 14:55:07,979 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:55:07,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:56:39,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1195.35547 ± 1188.386
2025-05-09 14:56:39,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [182.6083, 311.0955, 300.78912, 3127.1543, 120.49922, 863.47107, 120.91206, 1338.0148, 2456.7659, 3132.2444]
2025-05-09 14:56:39,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 128.0, 125.0, 1000.0, 65.0, 280.0, 65.0, 436.0, 801.0, 1000.0]
2025-05-09 14:56:39,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 12 hours, 16 minutes, 7 seconds)
2025-05-09 15:05:54,511 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:05:55,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:07:42,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1428.32446 ± 808.496
2025-05-09 15:07:42,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1925.496, 981.2542, 1623.8887, 1077.7366, 339.05133, 3106.5, 178.0518, 1918.9884, 1361.5259, 1770.7526]
2025-05-09 15:07:42,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [596.0, 332.0, 538.0, 352.0, 142.0, 1000.0, 83.0, 587.0, 453.0, 575.0]
2025-05-09 15:07:42,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 15 minutes, 22 seconds)
2025-05-09 15:16:30,469 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:16:30,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:18:23,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1513.72461 ± 980.907
2025-05-09 15:18:23,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [106.59487, 3118.6494, 1018.4277, 2317.303, 3099.4165, 1305.0538, 906.26025, 1380.8375, 1479.6464, 405.05643]
2025-05-09 15:18:23,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 1000.0, 336.0, 758.0, 1000.0, 432.0, 308.0, 468.0, 498.0, 159.0]
2025-05-09 15:18:23,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 14 minutes, 7 seconds)
2025-05-09 15:27:13,243 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:27:13,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:29:52,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2208.08667 ± 824.881
2025-05-09 15:29:52,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2924.606, 1603.9867, 1753.8171, 2467.156, 3119.4124, 1040.1011, 2234.6765, 801.4726, 3057.414, 3078.2217]
2025-05-09 15:29:52,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [939.0, 543.0, 555.0, 787.0, 1000.0, 318.0, 716.0, 279.0, 1000.0, 1000.0]
2025-05-09 15:29:52,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (2208.09) for latency MM1Queue_a033_s075
2025-05-09 15:29:52,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 15:29:52,707 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:29:52,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 11 minutes, 15 seconds)
2025-05-09 15:38:45,318 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:38:45,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:40:29,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1387.21350 ± 896.901
2025-05-09 15:40:29,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [101.975845, 1720.9652, 3211.0547, 949.80536, 330.01202, 1895.6654, 2023.2762, 1830.0829, 504.1996, 1305.098]
2025-05-09 15:40:29,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 560.0, 1000.0, 319.0, 135.0, 628.0, 654.0, 582.0, 186.0, 420.0]
2025-05-09 15:40:29,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 11 hours, 56 minutes, 8 seconds)
2025-05-09 15:49:37,228 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:49:37,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:51:34,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1537.01038 ± 1166.463
2025-05-09 15:51:34,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2867.7693, 3111.649, 158.5577, 1739.9426, 370.06674, 136.39844, 2839.4958, 2522.0525, 380.01465, 1244.1573]
2025-05-09 15:51:34,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [943.0, 1000.0, 78.0, 575.0, 152.0, 69.0, 892.0, 819.0, 152.0, 436.0]
2025-05-09 15:51:34,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 11 hours, 53 minutes, 49 seconds)
2025-05-09 16:00:31,899 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:00:31,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:02:12,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1301.53003 ± 1143.788
2025-05-09 16:02:12,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2696.7627, 3103.5278, 3094.5557, 674.10614, 157.0636, 820.1597, 129.18008, 194.27756, 1202.2727, 943.3944]
2025-05-09 16:02:12,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [900.0, 1000.0, 1000.0, 243.0, 77.0, 299.0, 66.0, 91.0, 390.0, 332.0]
2025-05-09 16:02:12,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 11 hours, 37 minutes, 34 seconds)
2025-05-09 16:11:30,696 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:11:30,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:12:58,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1122.18005 ± 950.638
2025-05-09 16:12:58,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [852.9019, 130.58366, 863.21936, 415.17767, 2126.9026, 166.74226, 585.1148, 2214.6687, 3093.9548, 772.53503]
2025-05-09 16:12:58,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [296.0, 66.0, 314.0, 164.0, 685.0, 81.0, 235.0, 754.0, 1000.0, 272.0]
2025-05-09 16:12:58,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 11 hours, 27 minutes, 52 seconds)
2025-05-09 16:21:48,205 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:21:48,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:23:13,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1114.89648 ± 865.107
2025-05-09 16:23:13,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [123.21108, 3138.2708, 1647.7025, 566.31323, 636.3612, 1828.6086, 214.94106, 604.3997, 1185.0498, 1204.1063]
2025-05-09 16:23:13,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 1000.0, 539.0, 211.0, 228.0, 590.0, 99.0, 218.0, 408.0, 394.0]
2025-05-09 16:23:13,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 1 minute, 29 seconds)
2025-05-09 16:32:14,127 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:32:14,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:33:43,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1189.17896 ± 1030.568
2025-05-09 16:33:43,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [726.9547, 1498.5038, 680.5758, 971.2603, 140.27582, 101.315346, 998.3817, 573.59656, 3110.638, 3090.2883]
2025-05-09 16:33:43,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [224.0, 502.0, 247.0, 335.0, 70.0, 55.0, 358.0, 211.0, 1000.0, 1000.0]
2025-05-09 16:33:43,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 10 hours, 49 minutes, 29 seconds)
2025-05-09 16:42:34,026 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:42:34,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:45:42,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2579.53271 ± 886.122
2025-05-09 16:45:42,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [134.55276, 3084.0288, 3005.5327, 3054.1523, 3200.5703, 2445.0662, 2101.9883, 3096.0513, 2534.976, 3138.4075]
2025-05-09 16:45:42,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 1000.0, 1000.0, 1000.0, 1000.0, 794.0, 670.0, 1000.0, 812.0, 1000.0]
2025-05-09 16:45:42,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (2579.53) for latency MM1Queue_a033_s075
2025-05-09 16:45:42,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 16:45:42,640 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:45:42,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 10 hours, 49 minutes, 37 seconds)
2025-05-09 16:55:14,629 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:55:14,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:56:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 800.91028 ± 1134.955
2025-05-09 16:56:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [428.49994, 3101.5935, 212.79868, 99.2467, 210.31178, 114.89376, 169.85745, 3020.1155, 419.13132, 232.65381]
2025-05-09 16:56:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [167.0, 1000.0, 97.0, 54.0, 98.0, 60.0, 80.0, 1000.0, 161.0, 103.0]
2025-05-09 16:56:17,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 10 hours, 38 minutes, 17 seconds)
2025-05-09 17:04:47,486 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:04:47,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:06:10,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1089.60547 ± 1003.248
2025-05-09 17:06:10,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [85.482834, 1110.6053, 149.0059, 145.42574, 351.75623, 1703.2081, 214.05103, 2023.7599, 2099.6836, 3013.076]
2025-05-09 17:06:10,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 330.0, 73.0, 72.0, 156.0, 555.0, 97.0, 663.0, 678.0, 964.0]
2025-05-09 17:06:10,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 10 hours, 17 minutes, 1 second)
2025-05-09 17:15:27,819 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:15:27,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:17:05,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1290.35132 ± 666.425
2025-05-09 17:17:05,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [175.41519, 2700.0435, 1318.788, 1644.9702, 935.07733, 658.36066, 1477.2183, 951.626, 1910.0197, 1131.9941]
2025-05-09 17:17:05,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 870.0, 431.0, 541.0, 321.0, 242.0, 482.0, 328.0, 611.0, 379.0]
2025-05-09 17:17:05,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 10 hours, 14 minutes, 9 seconds)
2025-05-09 17:26:23,348 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:26:23,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:28:42,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1899.69397 ± 1107.173
2025-05-09 17:28:42,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2312.3955, 3069.5513, 1268.4203, 1347.5785, 3111.8228, 79.64169, 76.8502, 2300.9556, 3173.9204, 2255.8054]
2025-05-09 17:28:42,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [751.0, 1000.0, 413.0, 455.0, 1000.0, 49.0, 48.0, 740.0, 1000.0, 748.0]
2025-05-09 17:28:42,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 15 minutes, 48 seconds)
2025-05-09 17:37:15,344 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:37:15,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:39:58,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2277.19141 ± 874.323
2025-05-09 17:39:58,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1784.5117, 1461.4216, 963.0644, 3131.0027, 3127.2798, 1377.279, 3284.2263, 3059.286, 1543.0703, 3040.774]
2025-05-09 17:39:58,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [562.0, 451.0, 339.0, 1000.0, 1000.0, 473.0, 1000.0, 1000.0, 525.0, 1000.0]
2025-05-09 17:39:58,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 9 hours, 56 minutes, 56 seconds)
2025-05-09 17:48:58,886 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:48:59,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:50:07,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 902.87189 ± 779.884
2025-05-09 17:50:07,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2653.7983, 785.78625, 1589.2115, 697.4657, 601.4677, 390.9979, 129.68217, 185.94745, 1706.7725, 287.5894]
2025-05-09 17:50:07,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [818.0, 259.0, 523.0, 244.0, 219.0, 151.0, 66.0, 90.0, 571.0, 125.0]
2025-05-09 17:50:07,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 9 hours, 41 minutes, 22 seconds)
2025-05-09 17:59:10,589 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:59:10,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:00:59,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1437.45532 ± 862.838
2025-05-09 18:00:59,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2698.2395, 305.3453, 576.49243, 1737.9537, 1358.8513, 1375.5459, 775.25397, 3022.7046, 1861.6796, 662.4861]
2025-05-09 18:00:59,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [874.0, 129.0, 209.0, 567.0, 445.0, 462.0, 274.0, 1000.0, 584.0, 235.0]
2025-05-09 18:00:59,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 9 hours, 41 minutes, 2 seconds)
2025-05-09 18:10:05,248 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:10:05,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:11:49,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1375.28516 ± 1234.710
2025-05-09 18:11:49,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [114.61783, 202.33601, 3161.8696, 1164.0706, 180.74095, 1060.3558, 107.93989, 3002.4958, 1589.3804, 3169.0427]
2025-05-09 18:11:49,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 96.0, 1000.0, 369.0, 86.0, 349.0, 58.0, 1000.0, 516.0, 1000.0]
2025-05-09 18:11:49,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 9 hours, 29 minutes, 8 seconds)
2025-05-09 18:20:36,131 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:20:36,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:21:52,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 973.41687 ± 955.944
2025-05-09 18:21:52,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [129.17998, 3087.1826, 154.5276, 578.4955, 1952.7871, 379.59012, 145.82347, 748.57635, 609.3124, 1948.6931]
2025-05-09 18:21:52,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 1000.0, 77.0, 214.0, 639.0, 149.0, 75.0, 274.0, 228.0, 629.0]
2025-05-09 18:21:52,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 9 hours, 2 minutes, 16 seconds)
2025-05-09 18:30:23,141 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:30:23,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:33:13,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2326.90674 ± 823.024
2025-05-09 18:33:13,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2968.4421, 1998.5193, 2540.7703, 3029.6077, 3085.4102, 2841.1584, 1249.9318, 632.6771, 3097.1096, 1825.4403]
2025-05-09 18:33:13,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [979.0, 642.0, 806.0, 1000.0, 1000.0, 912.0, 420.0, 230.0, 1000.0, 592.0]
2025-05-09 18:33:13,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 8 hours, 52 minutes, 30 seconds)
2025-05-09 18:43:15,895 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:43:15,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:44:44,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1101.91919 ± 905.936
2025-05-09 18:44:44,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3096.9072, 1996.353, 746.30646, 625.5011, 1491.4889, 579.659, 99.50727, 514.1959, 1723.0011, 146.27122]
2025-05-09 18:44:44,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 667.0, 269.0, 228.0, 513.0, 213.0, 55.0, 191.0, 569.0, 74.0]
2025-05-09 18:44:44,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 8 hours, 55 minutes, 17 seconds)
2025-05-09 18:54:03,441 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:54:03,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:55:30,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1070.37073 ± 687.535
2025-05-09 18:55:30,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2203.2263, 1648.5172, 1549.4781, 661.5357, 292.66165, 1395.51, 889.3812, 175.29955, 1697.83, 190.26692]
2025-05-09 18:55:30,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [713.0, 553.0, 490.0, 229.0, 123.0, 487.0, 305.0, 83.0, 559.0, 89.0]
2025-05-09 18:55:30,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 8 hours, 43 minutes, 23 seconds)
2025-05-09 19:04:40,915 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:04:40,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:06:54,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1804.76038 ± 1150.250
2025-05-09 19:06:54,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2538.1326, 406.99695, 3157.8853, 1060.3667, 664.0187, 535.69415, 740.79645, 3146.0005, 2656.7405, 3140.9734]
2025-05-09 19:06:54,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [773.0, 157.0, 1000.0, 381.0, 236.0, 197.0, 266.0, 1000.0, 815.0, 1000.0]
2025-05-09 19:06:54,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 8 hours, 37 minutes, 50 seconds)
2025-05-09 19:16:18,678 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:16:18,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:18:19,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1635.18298 ± 646.214
2025-05-09 19:18:19,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1625.0181, 1734.6528, 1169.7927, 1753.74, 3089.7087, 1344.4899, 393.64575, 1452.5442, 1779.502, 2008.7363]
2025-05-09 19:18:19,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [514.0, 535.0, 342.0, 559.0, 1000.0, 432.0, 154.0, 505.0, 562.0, 624.0]
2025-05-09 19:18:19,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 8 hours, 39 minutes, 16 seconds)
2025-05-09 19:27:57,268 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:27:57,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:29:23,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1118.85815 ± 749.134
2025-05-09 19:29:23,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1767.026, 1068.499, 806.82135, 1978.2074, 83.28517, 637.1499, 528.4237, 217.191, 2324.5872, 1777.3907]
2025-05-09 19:29:23,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [574.0, 356.0, 279.0, 622.0, 49.0, 225.0, 191.0, 96.0, 718.0, 566.0]
2025-05-09 19:29:23,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 25 minutes, 28 seconds)
2025-05-09 19:38:39,793 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:38:39,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:40:13,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1266.95508 ± 1087.835
2025-05-09 19:40:14,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [128.50874, 924.2817, 899.34814, 409.20166, 120.759636, 398.10483, 1724.9867, 1790.0319, 3105.0037, 3169.323]
2025-05-09 19:40:14,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 279.0, 306.0, 156.0, 63.0, 148.0, 534.0, 552.0, 944.0, 1000.0]
2025-05-09 19:40:14,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 8 minutes, 16 seconds)
2025-05-09 19:50:12,101 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:50:12,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:52:14,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1631.23962 ± 1049.101
2025-05-09 19:52:14,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3212.724, 811.03906, 1216.3997, 227.71817, 1230.6001, 1379.3943, 719.38513, 3112.0913, 1255.0659, 3147.978]
2025-05-09 19:52:14,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [981.0, 289.0, 388.0, 101.0, 383.0, 434.0, 241.0, 1000.0, 396.0, 1000.0]
2025-05-09 19:52:14,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 7 minutes, 55 seconds)
2025-05-09 20:01:13,988 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:01:13,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:03:41,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1936.76013 ± 1229.189
2025-05-09 20:03:41,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1740.3922, 113.43135, 3044.3765, 3296.903, 3091.4673, 1108.5944, 891.53485, 2890.266, 109.53111, 3081.1052]
2025-05-09 20:03:41,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [567.0, 60.0, 943.0, 1000.0, 951.0, 340.0, 302.0, 932.0, 58.0, 982.0]
2025-05-09 20:03:41,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 7 hours, 56 minutes, 55 seconds)
2025-05-09 20:14:10,152 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:14:10,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:17:33,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2709.57007 ± 779.974
2025-05-09 20:17:33,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3152.3923, 3171.7346, 3149.4006, 3155.3003, 1615.3492, 814.8628, 3210.5535, 3155.4, 2916.474, 2754.2341]
2025-05-09 20:17:33,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 527.0, 284.0, 1000.0, 1000.0, 922.0, 867.0]
2025-05-09 20:17:33,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (2709.57) for latency MM1Queue_a033_s075
2025-05-09 20:17:33,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 20:17:33,902 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 20:17:33,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 5 minutes, 49 seconds)
2025-05-09 20:26:39,823 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:26:39,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:28:59,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1823.96216 ± 980.503
2025-05-09 20:29:00,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [239.75154, 2586.4343, 980.362, 2764.8474, 3163.5273, 1464.4939, 3218.1553, 975.9202, 1517.4003, 1328.7292]
2025-05-09 20:29:00,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 830.0, 335.0, 873.0, 1000.0, 463.0, 1000.0, 332.0, 479.0, 440.0]
2025-05-09 20:29:00,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 56 minutes, 52 seconds)
2025-05-09 20:38:25,356 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:38:25,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:41:06,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2221.52661 ± 1269.412
2025-05-09 20:41:06,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [915.38055, 3256.5374, 3213.7422, 3238.3992, 1482.6614, 3259.262, 362.66208, 3136.9702, 3224.7092, 124.94159]
2025-05-09 20:41:06,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [307.0, 1000.0, 1000.0, 1000.0, 458.0, 1000.0, 145.0, 1000.0, 1000.0, 63.0]
2025-05-09 20:41:06,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 54 minutes, 47 seconds)
2025-05-09 20:50:20,175 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:50:20,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:52:15,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1476.47571 ± 1110.564
2025-05-09 20:52:15,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1992.644, 961.4267, 1022.87115, 3219.3442, 3210.975, 183.76653, 142.004, 701.7615, 2509.6187, 820.3449]
2025-05-09 20:52:15,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [643.0, 303.0, 315.0, 1000.0, 1000.0, 88.0, 70.0, 244.0, 769.0, 290.0]
2025-05-09 20:52:15,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 36 minutes, 3 seconds)
2025-05-09 21:01:22,752 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:01:22,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:03:36,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1823.99414 ± 939.620
2025-05-09 21:03:36,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2945.4272, 346.19815, 2403.0757, 1063.0111, 1962.2081, 2119.119, 2413.0654, 3237.4382, 1207.9689, 542.43005]
2025-05-09 21:03:36,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [893.0, 143.0, 740.0, 352.0, 624.0, 656.0, 766.0, 1000.0, 391.0, 196.0]
2025-05-09 21:03:36,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 23 minutes, 20 seconds)
2025-05-09 21:12:34,074 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:12:34,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:14:55,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1961.33008 ± 1132.234
2025-05-09 21:14:55,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1428.7372, 356.21423, 1483.6152, 3198.4714, 1222.717, 3255.384, 2898.2942, 149.25012, 2375.822, 3244.7952]
2025-05-09 21:14:55,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [464.0, 144.0, 465.0, 1000.0, 409.0, 1000.0, 900.0, 74.0, 741.0, 1000.0]
2025-05-09 21:14:55,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 6 hours, 52 minutes, 59 seconds)
2025-05-09 21:23:46,001 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:23:46,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:26:27,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2302.65771 ± 945.934
2025-05-09 21:26:28,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1305.7938, 2842.232, 3239.2883, 1885.2986, 3159.7327, 3255.6003, 2051.3872, 211.60828, 2997.5305, 2078.103]
2025-05-09 21:26:28,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [401.0, 857.0, 1000.0, 571.0, 1000.0, 997.0, 615.0, 98.0, 912.0, 640.0]
2025-05-09 21:26:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 42 minutes, 19 seconds)
2025-05-09 21:35:49,778 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:35:49,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:38:03,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1881.80664 ± 1183.637
2025-05-09 21:38:04,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1460.8973, 187.15135, 1294.5927, 2174.1118, 3179.857, 3233.9104, 790.9637, 236.62462, 3079.2573, 3180.7]
2025-05-09 21:38:04,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [470.0, 86.0, 397.0, 680.0, 1000.0, 1000.0, 276.0, 105.0, 907.0, 1000.0]
2025-05-09 21:38:04,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 27 minutes, 20 seconds)
2025-05-09 21:46:32,009 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:46:32,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:49:11,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2198.27686 ± 1322.174
2025-05-09 21:49:11,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3222.032, 314.04785, 3243.0283, 3208.9197, 363.24997, 1722.6316, 208.71323, 3229.596, 3279.2046, 3191.344]
2025-05-09 21:49:11,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 137.0, 1000.0, 1000.0, 146.0, 538.0, 98.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:49:11,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 15 minutes, 46 seconds)
2025-05-09 21:58:13,083 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:58:13,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:01:12,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2578.57935 ± 769.838
2025-05-09 22:01:13,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3228.2654, 906.0012, 1719.3519, 3200.9185, 2324.8743, 1971.7916, 3014.6846, 3066.2622, 3127.0486, 3226.5928]
2025-05-09 22:01:13,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 294.0, 546.0, 1000.0, 715.0, 579.0, 933.0, 942.0, 968.0, 1000.0]
2025-05-09 22:01:13,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 8 minutes, 44 seconds)
2025-05-09 22:10:29,350 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:10:29,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:12:35,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1715.11267 ± 892.803
2025-05-09 22:12:35,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1320.915, 3111.4292, 3128.8633, 2060.487, 1343.1785, 2610.035, 1137.2339, 836.4527, 673.8515, 928.6797]
2025-05-09 22:12:35,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [387.0, 1000.0, 1000.0, 654.0, 418.0, 842.0, 380.0, 291.0, 229.0, 316.0]
2025-05-09 22:12:35,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 5 hours, 57 minutes, 30 seconds)
2025-05-09 22:21:47,533 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:21:47,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:24:56,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2733.81177 ± 874.638
2025-05-09 22:24:56,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3228.7087, 3245.6528, 2974.1265, 3227.2866, 3296.8438, 358.11017, 3242.106, 3198.8472, 2289.979, 2276.4568]
2025-05-09 22:24:56,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 903.0, 1000.0, 1000.0, 145.0, 1000.0, 1000.0, 709.0, 713.0]
2025-05-09 22:24:56,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (2733.81) for latency MM1Queue_a033_s075
2025-05-09 22:24:56,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-09 22:24:56,242 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 22:24:56,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 50 minutes, 45 seconds)
2025-05-09 22:33:16,057 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:33:16,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:35:27,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1888.02576 ± 1284.620
2025-05-09 22:35:27,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1562.3036, 2995.4265, 3176.2559, 3283.1484, 67.59849, 2014.2966, 109.76974, 2355.1587, 117.34441, 3198.954]
2025-05-09 22:35:27,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [493.0, 862.0, 1000.0, 970.0, 41.0, 650.0, 57.0, 743.0, 64.0, 1000.0]
2025-05-09 22:35:27,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 32 minutes, 54 seconds)
2025-05-09 22:44:56,876 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:44:56,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:47:14,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1869.35779 ± 1116.165
2025-05-09 22:47:14,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3238.7769, 907.8278, 617.43726, 1391.8146, 884.33997, 303.80243, 2189.8174, 2755.3062, 3178.496, 3225.9607]
2025-05-09 22:47:14,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 313.0, 225.0, 452.0, 295.0, 130.0, 624.0, 867.0, 1000.0, 1000.0]
2025-05-09 22:47:14,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 25 minutes, 5 seconds)
2025-05-09 22:56:34,975 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:56:34,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:59:24,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2360.60547 ± 1040.409
2025-05-09 22:59:24,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1143.0466, 3244.5676, 3199.227, 3040.7888, 3249.0522, 843.71423, 959.8301, 3268.8293, 1458.0625, 3198.9382]
2025-05-09 22:59:24,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [383.0, 1000.0, 1000.0, 956.0, 1000.0, 293.0, 333.0, 1000.0, 477.0, 1000.0]
2025-05-09 22:59:24,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 14 minutes, 13 seconds)
2025-05-09 23:07:55,424 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:07:55,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:10:23,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2061.60181 ± 1213.898
2025-05-09 23:10:23,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3243.6943, 1716.5938, 3244.1003, 198.84659, 2829.7644, 113.8992, 3158.1868, 894.7166, 1973.3331, 3242.8804]
2025-05-09 23:10:23,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 565.0, 1000.0, 92.0, 876.0, 59.0, 1000.0, 304.0, 651.0, 1000.0]
2025-05-09 23:10:23,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 37 seconds)
2025-05-09 23:19:11,422 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:19:11,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:21:20,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1784.14917 ± 1172.828
2025-05-09 23:21:20,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [922.90106, 3360.7756, 1353.6334, 3183.6616, 904.8783, 171.92113, 3251.9094, 2641.7322, 328.93887, 1721.1426]
2025-05-09 23:21:20,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [319.0, 1000.0, 440.0, 1000.0, 308.0, 85.0, 1000.0, 828.0, 136.0, 547.0]
2025-05-09 23:21:20,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 41 minutes, 59 seconds)
2025-05-09 23:30:43,096 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:30:43,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:32:57,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1959.67310 ± 1088.356
2025-05-09 23:32:57,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2933.4578, 2169.8853, 960.37256, 194.20244, 2690.8508, 2924.264, 1478.4716, 2685.6697, 3271.2292, 288.32733]
2025-05-09 23:32:57,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [895.0, 630.0, 316.0, 89.0, 834.0, 898.0, 465.0, 823.0, 1000.0, 122.0]
2025-05-09 23:32:57,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 36 minutes)
2025-05-09 23:41:50,034 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:41:50,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:43:57,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1817.32776 ± 1025.108
2025-05-09 23:43:57,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1266.3568, 3245.7231, 1334.1785, 1869.868, 181.27545, 1077.3531, 1624.8328, 3242.418, 1071.6534, 3259.6177]
2025-05-09 23:43:57,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [381.0, 1000.0, 440.0, 566.0, 87.0, 358.0, 524.0, 1000.0, 356.0, 1000.0]
2025-05-09 23:43:57,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 20 minutes, 53 seconds)
2025-05-09 23:52:51,381 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:52:51,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:55:52,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2625.29346 ± 771.536
2025-05-09 23:55:52,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2769.8257, 3214.9253, 1669.6957, 3178.1084, 1786.6716, 3251.2454, 3255.579, 3211.5935, 2862.5593, 1052.7314]
2025-05-09 23:55:52,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [854.0, 1000.0, 533.0, 1000.0, 567.0, 1000.0, 1000.0, 1000.0, 884.0, 347.0]
2025-05-09 23:55:52,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 8 minutes, 28 seconds)
2025-05-10 00:04:27,274 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:04:27,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:06:17,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1553.60864 ± 1115.649
2025-05-10 00:06:17,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [390.41614, 1179.5845, 1524.4987, 380.80774, 3208.3447, 2110.9282, 3281.8982, 158.50534, 2538.427, 762.676]
2025-05-10 00:06:17,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 387.0, 448.0, 150.0, 1000.0, 672.0, 1000.0, 77.0, 792.0, 264.0]
2025-05-10 00:06:17,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 3 hours, 54 minutes, 43 seconds)
2025-05-10 00:15:16,046 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:15:16,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:17:05,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1503.69873 ± 940.733
2025-05-10 00:17:05,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [372.82376, 1606.5814, 2293.3096, 216.45996, 437.30267, 2408.4019, 1989.1073, 910.5353, 1651.0444, 3151.421]
2025-05-10 00:17:05,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 516.0, 724.0, 108.0, 168.0, 760.0, 633.0, 306.0, 553.0, 1000.0]
2025-05-10 00:17:05,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 43 minutes, 1 second)
2025-05-10 00:25:53,648 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:25:53,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:28:17,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2031.84827 ± 1165.066
2025-05-10 00:28:18,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3102.1748, 3264.607, 796.6228, 3298.0347, 417.34625, 1418.2772, 254.79846, 2414.6199, 2100.63, 3251.371]
2025-05-10 00:28:18,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [952.0, 1000.0, 259.0, 1000.0, 160.0, 457.0, 112.0, 743.0, 619.0, 1000.0]
2025-05-10 00:28:18,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 30 minutes, 18 seconds)
2025-05-10 00:37:48,513 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:37:48,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:41:07,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2827.82690 ± 970.766
2025-05-10 00:41:07,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3299.261, 355.2029, 3300.6165, 3314.8884, 3312.0232, 3258.9607, 3303.476, 1574.1986, 3268.9722, 3290.6702]
2025-05-10 00:41:07,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 142.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 494.0, 972.0, 1000.0]
2025-05-10 00:41:07,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (2827.83) for latency MM1Queue_a033_s075
2025-05-10 00:41:07,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 00:41:07,176 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 00:41:07,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 25 minutes, 47 seconds)
2025-05-10 00:49:38,205 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:49:38,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:52:30,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2491.85913 ± 685.498
2025-05-10 00:52:30,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3264.7683, 3248.3843, 3335.2725, 1985.5774, 1912.2587, 3278.2512, 1963.2268, 1540.9254, 1843.3066, 2546.621]
2025-05-10 00:52:30,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 606.0, 610.0, 1000.0, 603.0, 463.0, 579.0, 798.0]
2025-05-10 00:52:30,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 12 minutes, 32 seconds)
2025-05-10 01:01:03,554 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:01:03,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:04:28,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2926.20996 ± 845.756
2025-05-10 01:04:29,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2877.2913, 3242.8975, 3287.8154, 3199.2385, 3247.2244, 3265.3389, 3255.0322, 3214.1912, 3261.5186, 411.55203]
2025-05-10 01:04:29,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [892.0, 1000.0, 1000.0, 954.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 159.0]
2025-05-10 01:04:29,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (2926.21) for latency MM1Queue_a033_s075
2025-05-10 01:04:29,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 01:04:29,131 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:04:29,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 6 minutes, 14 seconds)
2025-05-10 01:13:21,119 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:13:21,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:15:51,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2214.76367 ± 1003.288
2025-05-10 01:15:51,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2985.872, 3319.3003, 1652.3353, 3378.6077, 3320.3223, 653.12726, 2799.2402, 953.5647, 1747.19, 1338.0787]
2025-05-10 01:15:51,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [887.0, 1000.0, 497.0, 1000.0, 1000.0, 228.0, 833.0, 318.0, 533.0, 424.0]
2025-05-10 01:15:51,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 56 minutes, 17 seconds)
2025-05-10 01:24:50,321 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:24:50,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:27:12,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2011.91211 ± 1131.024
2025-05-10 01:27:12,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2850.7466, 1309.7325, 3262.6868, 87.18449, 2639.424, 2720.556, 1459.688, 2321.8982, 3294.778, 172.42514]
2025-05-10 01:27:12,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [867.0, 388.0, 948.0, 51.0, 823.0, 841.0, 473.0, 704.0, 1000.0, 84.0]
2025-05-10 01:27:12,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 44 minutes, 54 seconds)
2025-05-10 01:36:14,683 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:36:14,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:39:10,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2551.93237 ± 914.329
2025-05-10 01:39:10,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3159.5315, 3315.8318, 2946.3953, 2186.1638, 954.0706, 904.22675, 3316.8748, 3305.9553, 3282.0671, 2148.2095]
2025-05-10 01:39:10,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [932.0, 1000.0, 872.0, 689.0, 307.0, 269.0, 1000.0, 1000.0, 1000.0, 616.0]
2025-05-10 01:39:10,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 30 minutes, 56 seconds)
2025-05-10 01:48:20,645 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:48:20,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:50:27,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1779.33984 ± 1231.807
2025-05-10 01:50:27,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [917.1147, 3324.8423, 112.80371, 3238.3867, 3279.1755, 2059.2012, 139.32607, 1762.8157, 495.43668, 2464.2961]
2025-05-10 01:50:27,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [323.0, 1000.0, 62.0, 1000.0, 1000.0, 648.0, 73.0, 553.0, 191.0, 754.0]
2025-05-10 01:50:27,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 19 minutes, 3 seconds)
2025-05-10 01:59:29,073 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:59:29,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:02:54,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2925.58105 ± 497.101
2025-05-10 02:02:54,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3287.919, 1780.0415, 3268.2324, 3014.5554, 3242.5454, 3193.9028, 3245.3591, 3261.1548, 2236.4412, 2725.659]
2025-05-10 02:02:54,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 544.0, 1000.0, 957.0, 1000.0, 1000.0, 1000.0, 1000.0, 707.0, 852.0]
2025-05-10 02:02:54,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 8 minutes, 31 seconds)
2025-05-10 02:11:45,916 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:11:45,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:13:54,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1850.83655 ± 1244.274
2025-05-10 02:13:54,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3270.3777, 3003.856, 29.654463, 2866.1223, 62.564896, 2079.0376, 1345.1726, 379.94803, 3262.3796, 2209.2532]
2025-05-10 02:13:54,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 885.0, 23.0, 884.0, 38.0, 633.0, 433.0, 152.0, 1000.0, 694.0]
2025-05-10 02:13:54,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 56 minutes, 6 seconds)
2025-05-10 02:22:53,782 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:22:54,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:25:34,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2248.98608 ± 1300.847
2025-05-10 02:25:34,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [364.79526, 3223.7722, 3289.3135, 116.00256, 3244.0862, 3277.0737, 3243.2444, 691.3047, 3288.443, 1751.8267]
2025-05-10 02:25:34,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 1000.0, 1000.0, 63.0, 1000.0, 1000.0, 1000.0, 246.0, 1000.0, 550.0]
2025-05-10 02:25:34,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 45 minutes, 4 seconds)
2025-05-10 02:33:43,649 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:33:43,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:36:46,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2641.45361 ± 954.320
2025-05-10 02:36:46,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1679.1156, 537.25665, 3316.2812, 3475.0264, 3327.6565, 3269.447, 3316.7854, 2316.1, 1823.4203, 3353.4485]
2025-05-10 02:36:46,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [530.0, 194.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 713.0, 557.0, 1000.0]
2025-05-10 02:36:46,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 32 minutes, 9 seconds)
2025-05-10 02:45:49,310 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:45:49,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:48:41,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2520.55029 ± 889.736
2025-05-10 02:48:41,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2410.6187, 3012.489, 2568.0615, 3322.4224, 3289.4353, 2430.635, 2096.4714, 2994.0305, 110.05351, 2971.2874]
2025-05-10 02:48:41,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [728.0, 916.0, 798.0, 1000.0, 1000.0, 760.0, 659.0, 901.0, 58.0, 901.0]
2025-05-10 02:48:41,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 21 minutes, 32 seconds)
2025-05-10 02:57:53,058 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:57:53,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:01:27,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 3122.52734 ± 417.372
2025-05-10 03:01:27,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3320.1719, 3297.8645, 3291.2427, 3281.3232, 2701.0315, 3318.3706, 3359.7358, 3382.6355, 3271.283, 2001.6151]
2025-05-10 03:01:27,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 815.0, 1000.0, 1000.0, 1000.0, 1000.0, 635.0]
2025-05-10 03:01:27,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1226 [INFO]: New best (3122.53) for latency MM1Queue_a033_s075
2025-05-10 03:01:27,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1229 [INFO]: saving network
2025-05-10 03:01:27,790 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-hopper/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 03:01:27,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 10 minutes, 16 seconds)
2025-05-10 03:09:58,129 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:09:58,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:12:30,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2141.64795 ± 1012.927
2025-05-10 03:12:30,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1482.1299, 3227.2488, 1037.4402, 3279.9941, 3304.3933, 2824.9836, 553.3223, 2238.0754, 883.2762, 2585.6172]
2025-05-10 03:12:30,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [474.0, 1000.0, 342.0, 1000.0, 1000.0, 864.0, 205.0, 673.0, 297.0, 787.0]
2025-05-10 03:12:30,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 58 minutes, 35 seconds)
2025-05-10 03:21:32,940 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:21:32,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:24:04,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2248.61084 ± 1063.176
2025-05-10 03:24:04,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [1769.661, 3336.9028, 3323.7185, 3358.347, 1242.1989, 2771.8853, 223.28453, 1755.7534, 3313.7683, 1390.5927]
2025-05-10 03:24:04,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [521.0, 1000.0, 1000.0, 1000.0, 404.0, 844.0, 101.0, 516.0, 1000.0, 431.0]
2025-05-10 03:24:04,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 46 minutes, 47 seconds)
2025-05-10 03:32:46,986 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:32:46,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:35:55,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2750.44922 ± 1112.087
2025-05-10 03:35:55,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3341.977, 3375.867, 3296.6382, 3100.934, 3338.0332, 124.17897, 1008.526, 3324.5317, 3324.4917, 3269.314]
2025-05-10 03:35:55,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 996.0, 1000.0, 65.0, 341.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:35:55,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 35 minutes, 29 seconds)
2025-05-10 03:45:03,916 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:45:03,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:47:16,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 1896.32263 ± 1387.994
2025-05-10 03:47:16,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [95.82244, 3269.6619, 638.68134, 3087.386, 3238.898, 388.49173, 1516.2019, 152.17636, 3262.9702, 3312.936]
2025-05-10 03:47:16,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 1000.0, 230.0, 946.0, 1000.0, 153.0, 484.0, 75.0, 1000.0, 1000.0]
2025-05-10 03:47:16,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 23 minutes, 26 seconds)
2025-05-10 03:55:50,826 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:55:50,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:59:02,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2765.85181 ± 980.695
2025-05-10 03:59:03,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [3218.3047, 3180.3538, 3290.4768, 3307.1702, 3280.8015, 3284.0286, 397.73587, 3195.0476, 3206.1123, 1298.4861]
2025-05-10 03:59:03,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 157.0, 1000.0, 1000.0, 422.0]
2025-05-10 03:59:03,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 31 seconds)
2025-05-10 04:08:33,324 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:08:33,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:11:02,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1221 [DEBUG]: Total Reward: 2228.67578 ± 1023.531
2025-05-10 04:11:03,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1222 [DEBUG]: All rewards: [2713.8064, 3176.952, 3296.4092, 3229.8645, 227.3366, 2228.104, 580.36365, 2480.8433, 2628.1062, 1724.9727]
2025-05-10 04:11:03,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1223 [DEBUG]: All trajectory lengths: [825.0, 898.0, 1000.0, 1000.0, 100.0, 652.0, 202.0, 754.0, 763.0, 537.0]
2025-05-10 04:11:03,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1251 [DEBUG]: Training session finished
