2025-05-09 09:43:40,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-05-09 09:43:40,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay
2025-05-09 09:43:40,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x1457bf5118d0>}
2025-05-09 09:43:40,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1111 [DEBUG]: using device: cuda
2025-05-09 09:43:40,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-09 09:43:40,820 baseline-mbpac-noisy-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 09:43:40,821 baseline-mbpac-noisy-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 09:43:40,828 baseline-mbpac-noisy-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-09 09:43:41,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-09 09:43:41,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-09 09:52:26,576 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 09:52:26,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:56:16,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -297.67822 ± 23.913
2025-05-09 09:56:16,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-281.74976, -265.24942, -313.5431, -302.64246, -307.6747, -283.88217, -338.9069, -329.0625, -289.47318, -264.59814]
2025-05-09 09:56:16,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 09:56:16,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (-297.68) for latency MM1Queue_a033_s075
2025-05-09 09:56:16,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 09:56:16,524 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:56:17,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 20 hours, 46 minutes, 27 seconds)
2025-05-09 10:05:31,857 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:05:31,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:09:16,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -405.84488 ± 94.069
2025-05-09 10:09:16,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-506.48755, -443.54822, -413.7226, -319.66022, -344.32617, -291.75385, -542.2349, -306.6467, -547.82837, -342.2399]
2025-05-09 10:09:16,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 10:09:16,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 53 minutes, 50 seconds)
2025-05-09 10:18:30,974 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:18:30,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:22:15,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 633.50562 ± 90.684
2025-05-09 10:22:15,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [583.6138, 487.31967, 471.3796, 616.2889, 700.2797, 726.59406, 720.9328, 701.2636, 615.6653, 711.7184]
2025-05-09 10:22:15,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 10:22:15,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (633.51) for latency MM1Queue_a033_s075
2025-05-09 10:22:15,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 10:22:15,606 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:22:15,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 20 hours, 47 minutes)
2025-05-09 10:31:30,036 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:31:30,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:35:21,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1660.52441 ± 557.196
2025-05-09 10:35:21,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [403.847, 2172.4807, 1781.8413, 1888.5458, 1825.5549, 1626.1914, 2064.206, 806.281, 2117.4766, 1918.8186]
2025-05-09 10:35:21,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 10:35:21,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (1660.52) for latency MM1Queue_a033_s075
2025-05-09 10:35:21,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 10:35:21,916 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:35:21,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 40 minutes, 8 seconds)
2025-05-09 10:44:38,294 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:44:38,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:48:23,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1652.48767 ± 971.002
2025-05-09 10:48:23,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2126.8232, 673.3771, 2655.48, 2531.347, 215.61537, 771.6598, 2284.8733, 2725.8142, 326.19104, 2213.695]
2025-05-09 10:48:23,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 10:48:23,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 20 hours, 29 minutes, 18 seconds)
2025-05-09 10:57:40,905 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:57:40,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:01:28,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2339.86938 ± 721.305
2025-05-09 11:01:28,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3071.7742, 1684.7864, 2875.7334, 2902.809, 2543.7148, 2559.757, 987.30695, 3041.9607, 1244.3436, 2486.5063]
2025-05-09 11:01:28,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 11:01:28,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (2339.87) for latency MM1Queue_a033_s075
2025-05-09 11:01:28,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 11:01:28,513 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:01:28,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 25 minutes, 36 seconds)
2025-05-09 11:10:44,797 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:10:44,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:14:30,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3071.28027 ± 150.936
2025-05-09 11:14:30,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3066.6685, 2983.833, 3073.1643, 3062.232, 2865.9336, 3018.936, 3262.2405, 2834.2463, 3247.2385, 3298.3105]
2025-05-09 11:14:30,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 11:14:30,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (3071.28) for latency MM1Queue_a033_s075
2025-05-09 11:14:30,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 11:14:30,415 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:14:30,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 20 hours, 13 minutes, 11 seconds)
2025-05-09 11:23:45,750 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:23:45,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:27:30,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2679.68970 ± 813.847
2025-05-09 11:27:30,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3163.4634, 3269.781, 740.52454, 3069.01, 2746.9246, 3071.0889, 3121.8625, 1474.0896, 3164.4043, 2975.7468]
2025-05-09 11:27:30,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 11:27:30,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 38 seconds)
2025-05-09 11:36:48,653 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:36:48,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:40:34,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2827.82495 ± 652.013
2025-05-09 11:40:35,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3344.1755, 3322.66, 1848.1057, 3201.9844, 3542.0527, 3168.2808, 3235.9749, 1767.7324, 1978.2494, 2869.0325]
2025-05-09 11:40:35,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 11:40:35,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 47 minutes, 10 seconds)
2025-05-09 11:49:52,850 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:49:53,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:53:40,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3350.96826 ± 308.298
2025-05-09 11:53:40,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3451.0518, 3785.094, 3262.0867, 3541.377, 3554.314, 3554.3835, 3247.964, 2575.8767, 3258.3074, 3279.23]
2025-05-09 11:53:40,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 11:53:40,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (3350.97) for latency MM1Queue_a033_s075
2025-05-09 11:53:40,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 11:53:40,846 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:53:40,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 35 minutes, 10 seconds)
2025-05-09 12:02:55,515 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:02:55,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:06:44,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2698.51562 ± 1024.947
2025-05-09 12:06:44,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2129.3079, 2391.1218, 3635.5, 3553.702, 3403.9194, 191.88235, 3558.9463, 2541.4583, 3466.033, 2113.2852]
2025-05-09 12:06:44,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 12:06:44,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 21 minutes, 38 seconds)
2025-05-09 12:15:59,274 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:15:59,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:19:46,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3408.69531 ± 614.103
2025-05-09 12:19:47,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4148.4985, 3383.7212, 3838.1753, 3946.6401, 2785.0913, 2544.1162, 3672.4233, 2292.607, 3939.3252, 3536.3552]
2025-05-09 12:19:47,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 12:19:47,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (3408.70) for latency MM1Queue_a033_s075
2025-05-09 12:19:47,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 12:19:47,184 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:19:47,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 8 minutes, 55 seconds)
2025-05-09 12:29:01,728 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:29:01,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:32:49,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3553.15479 ± 1050.731
2025-05-09 12:32:49,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3710.2017, 3940.9568, 3756.395, 486.72302, 3909.0532, 4259.666, 3856.25, 3383.258, 4263.1465, 3965.8977]
2025-05-09 12:32:49,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 12:32:49,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (3553.15) for latency MM1Queue_a033_s075
2025-05-09 12:32:49,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 12:32:49,866 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:32:49,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 56 minutes, 32 seconds)
2025-05-09 12:42:05,509 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:42:05,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:45:50,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2868.81494 ± 1427.853
2025-05-09 12:45:50,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [470.553, 3721.526, 948.0237, 4171.6377, 4130.6123, 1045.3163, 3881.772, 3727.8022, 2436.1477, 4154.7603]
2025-05-09 12:45:50,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 12:45:51,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 42 minutes, 23 seconds)
2025-05-09 12:55:06,326 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:55:06,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:58:50,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4003.07031 ± 165.217
2025-05-09 12:58:50,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3967.3232, 4039.207, 4317.987, 4003.4775, 3979.8232, 3794.5708, 4183.1353, 4083.8962, 3710.6843, 3950.597]
2025-05-09 12:58:50,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 12:58:50,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (4003.07) for latency MM1Queue_a033_s075
2025-05-09 12:58:50,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 12:58:50,907 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:58:50,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 27 minutes, 50 seconds)
2025-05-09 13:08:06,912 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:08:07,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:11:53,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3449.28369 ± 1347.070
2025-05-09 13:11:53,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3958.6104, 3939.0051, 4139.5117, 4121.0996, 33.94102, 3988.5564, 1711.0309, 4144.641, 4078.0933, 4378.3496]
2025-05-09 13:11:53,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:11:53,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 14 minutes, 44 seconds)
2025-05-09 13:21:10,587 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:21:10,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:25:01,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3806.77661 ± 902.834
2025-05-09 13:25:01,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4430.232, 4346.0557, 3780.1987, 4345.8926, 4166.548, 4400.248, 4480.3667, 3559.0972, 1408.145, 3150.9807]
2025-05-09 13:25:01,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:25:01,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 3 minutes, 1 second)
2025-05-09 13:34:19,596 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:34:19,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:38:03,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3398.89258 ± 1218.017
2025-05-09 13:38:03,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4134.065, 1805.7833, 1929.5947, 4119.9927, 4499.9834, 4525.801, 1166.7465, 3328.5264, 4552.5293, 3925.9033]
2025-05-09 13:38:03,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:38:03,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 49 minutes, 42 seconds)
2025-05-09 13:47:18,681 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:47:18,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:51:06,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3776.87646 ± 1002.014
2025-05-09 13:51:06,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4352.329, 4164.053, 2123.1145, 4446.4077, 1570.9185, 4274.8315, 3612.2695, 4455.2534, 4251.6836, 4517.9014]
2025-05-09 13:51:06,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 13:51:06,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 37 minutes, 13 seconds)
2025-05-09 14:00:21,111 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:00:21,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:04:03,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3561.16479 ± 1247.813
2025-05-09 14:04:03,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3874.3374, 1478.7084, 4710.5933, 1307.5253, 3779.5837, 4484.8276, 2482.323, 4439.4194, 4398.368, 4655.964]
2025-05-09 14:04:03,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:04:03,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 23 minutes, 17 seconds)
2025-05-09 14:13:16,730 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:13:16,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:17:07,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3469.92432 ± 1626.059
2025-05-09 14:17:07,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3564.0706, 365.28888, 3870.829, 4793.757, 4565.6987, 4304.4644, 4373.49, 4442.2524, 204.1495, 4215.2407]
2025-05-09 14:17:07,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:17:07,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 17 hours, 10 minutes, 38 seconds)
2025-05-09 14:26:23,974 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:26:24,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:30:12,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3645.87744 ± 1197.142
2025-05-09 14:30:12,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4461.9673, 4026.4587, 4352.8916, 4577.5293, 4391.089, 1603.7842, 4419.784, 4227.778, 3271.4521, 1126.041]
2025-05-09 14:30:12,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:30:12,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 16 hours, 56 minutes, 43 seconds)
2025-05-09 14:39:28,725 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:39:29,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:43:16,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4027.20166 ± 666.044
2025-05-09 14:43:16,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4160.215, 4272.3193, 4098.945, 2057.9622, 4323.741, 4413.2783, 4285.338, 4419.268, 4103.4824, 4137.4707]
2025-05-09 14:43:16,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:43:16,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (4027.20) for latency MM1Queue_a033_s075
2025-05-09 14:43:16,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:43:16,836 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:43:16,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 44 minutes, 26 seconds)
2025-05-09 14:52:33,981 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:52:33,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:56:17,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4040.23242 ± 661.306
2025-05-09 14:56:17,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4441.938, 2176.4094, 4231.971, 3829.0725, 3910.6453, 4174.307, 4581.117, 4509.073, 4260.3345, 4287.4517]
2025-05-09 14:56:17,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:56:17,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (4040.23) for latency MM1Queue_a033_s075
2025-05-09 14:56:17,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:56:18,000 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:56:18,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 30 minutes, 53 seconds)
2025-05-09 15:05:34,392 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:05:34,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:09:22,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4358.04980 ± 581.543
2025-05-09 15:09:22,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4393.079, 4308.595, 4879.33, 4499.4478, 4213.0347, 4725.707, 2721.7617, 4526.7876, 4504.038, 4808.7163]
2025-05-09 15:09:22,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:09:22,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (4358.05) for latency MM1Queue_a033_s075
2025-05-09 15:09:22,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:09:22,535 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:09:22,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 19 minutes, 49 seconds)
2025-05-09 15:18:37,974 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:18:37,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:22:29,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4426.19678 ± 166.519
2025-05-09 15:22:29,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4715.1807, 4150.202, 4592.5234, 4347.5933, 4502.0596, 4220.6396, 4570.27, 4362.974, 4469.7236, 4330.8027]
2025-05-09 15:22:29,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:22:29,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (4426.20) for latency MM1Queue_a033_s075
2025-05-09 15:22:29,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:22:29,788 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:22:29,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 7 minutes, 26 seconds)
2025-05-09 15:31:47,068 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:31:47,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:35:32,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3548.65552 ± 1467.398
2025-05-09 15:35:33,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4579.853, 4157.0063, 4003.699, 4482.6904, 4465.8887, 285.81738, 1038.5662, 3883.3945, 4239.68, 4349.9575]
2025-05-09 15:35:33,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:35:33,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 54 minutes, 8 seconds)
2025-05-09 15:44:49,212 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:44:49,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:48:33,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4041.87036 ± 1095.472
2025-05-09 15:48:33,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4224.282, 4341.6025, 1716.7169, 2143.2012, 4693.0386, 4260.915, 5152.232, 4917.916, 4517.348, 4451.4487]
2025-05-09 15:48:33,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:48:33,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 40 minutes, 4 seconds)
2025-05-09 15:57:50,053 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:57:50,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:01:36,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4144.40332 ± 917.523
2025-05-09 16:01:37,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4439.1147, 4211.936, 4666.7783, 2549.1482, 4467.807, 4916.3975, 2243.9048, 4232.352, 4566.868, 5149.728]
2025-05-09 16:01:37,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:01:37,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 15 hours, 27 minutes, 35 seconds)
2025-05-09 16:10:54,388 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:10:54,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:14:40,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3914.94678 ± 1588.765
2025-05-09 16:14:40,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4828.3057, 4414.0703, 4523.5186, 4394.0405, 4822.351, 1340.9675, 4847.1484, 249.41936, 4857.1123, 4872.5366]
2025-05-09 16:14:40,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:14:40,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 15 hours, 14 minutes, 17 seconds)
2025-05-09 16:23:55,048 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:23:55,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:27:44,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4217.32324 ± 472.641
2025-05-09 16:27:44,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3873.95, 4307.782, 4840.0127, 3710.197, 4524.258, 3176.5535, 4593.964, 4215.495, 4566.5547, 4364.4644]
2025-05-09 16:27:44,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:27:44,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 15 hours, 20 seconds)
2025-05-09 16:37:00,677 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:37:00,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:40:45,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4331.33301 ± 1367.126
2025-05-09 16:40:45,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [329.8705, 5025.6455, 5257.555, 4408.614, 5181.809, 4734.6807, 4292.998, 4688.435, 4537.9644, 4855.759]
2025-05-09 16:40:45,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:40:45,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 46 minutes, 45 seconds)
2025-05-09 16:50:01,311 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:50:01,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:53:48,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2839.91650 ± 2108.833
2025-05-09 16:53:49,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4609.848, 136.12653, 5298.4175, 737.54126, 4671.6396, 833.575, 22.894537, 4625.8193, 5048.5337, 2414.7686]
2025-05-09 16:53:49,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:53:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 34 minutes, 27 seconds)
2025-05-09 17:03:05,891 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:03:05,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:06:56,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3710.11792 ± 1585.998
2025-05-09 17:06:56,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5386.1333, 4655.5903, 4495.4604, 785.5844, 4684.2886, 1262.3103, 4310.637, 4592.04, 4893.1177, 2036.0142]
2025-05-09 17:06:56,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:06:56,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 22 minutes, 16 seconds)
2025-05-09 17:16:14,027 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:16:14,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:19:57,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4171.41748 ± 1742.048
2025-05-09 17:19:58,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5408.0, 4797.1836, 4926.407, 13.893126, 4801.8877, 5122.657, 4983.5986, 1576.6858, 4616.861, 5467.0044]
2025-05-09 17:19:58,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:19:58,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 8 minutes, 41 seconds)
2025-05-09 17:29:15,054 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:29:15,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:33:03,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4171.54248 ± 1274.239
2025-05-09 17:33:03,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5144.1475, 5189.846, 3500.959, 4900.4473, 2660.491, 3267.7358, 5508.1016, 4946.3647, 5068.166, 1529.1683]
2025-05-09 17:33:03,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:33:03,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 56 minutes, 3 seconds)
2025-05-09 17:42:21,049 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:42:21,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:46:10,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4284.92480 ± 1603.243
2025-05-09 17:46:10,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5300.248, 4924.9136, 5332.1445, 25.58202, 5033.403, 4943.798, 4853.6455, 4051.367, 2843.8901, 5540.2534]
2025-05-09 17:46:10,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:46:10,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 44 minutes, 9 seconds)
2025-05-09 17:55:26,808 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:55:26,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:59:10,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4754.45410 ± 506.037
2025-05-09 17:59:10,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5007.0107, 4892.598, 4900.522, 3395.7053, 4898.692, 5157.3223, 5202.4507, 5024.714, 4698.873, 4366.6494]
2025-05-09 17:59:10,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:59:10,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (4754.45) for latency MM1Queue_a033_s075
2025-05-09 17:59:10,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 17:59:10,850 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:59:10,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 30 minutes, 27 seconds)
2025-05-09 18:08:27,649 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:08:27,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:12:18,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4219.34570 ± 1711.186
2025-05-09 18:12:18,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4881.316, 5037.7036, 5115.8315, 2150.6821, 5464.152, 5537.5728, 4215.7837, 5124.6816, 4768.6514, -102.91715]
2025-05-09 18:12:18,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:12:18,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 17 minutes, 24 seconds)
2025-05-09 18:21:34,170 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:21:34,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:25:19,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4367.26221 ± 1026.461
2025-05-09 18:25:19,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4819.1685, 4828.979, 5122.264, 2589.417, 4679.437, 4982.4927, 2087.8782, 4807.1934, 4905.062, 4850.732]
2025-05-09 18:25:19,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:25:19,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 4 minutes, 18 seconds)
2025-05-09 18:34:34,411 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:34:34,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:38:20,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4669.45996 ± 396.340
2025-05-09 18:38:21,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5040.6006, 4859.5054, 4768.9756, 4423.9404, 4286.1914, 4663.2837, 4792.3506, 3789.761, 5283.6606, 4786.327]
2025-05-09 18:38:21,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:38:21,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 50 minutes, 40 seconds)
2025-05-09 18:47:34,964 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:47:34,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:51:24,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4723.33887 ± 571.232
2025-05-09 18:51:24,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5582.653, 4316.6313, 4773.488, 4929.9014, 4489.4116, 4606.125, 3375.545, 4983.6694, 5306.3164, 4869.65]
2025-05-09 18:51:24,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:51:24,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 36 minutes, 47 seconds)
2025-05-09 19:00:40,529 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:00:40,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:04:29,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4094.60278 ± 1050.916
2025-05-09 19:04:29,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4236.77, 4835.972, 2360.4668, 5168.3965, 5158.3394, 4309.421, 2254.785, 4524.2163, 4932.789, 3164.87]
2025-05-09 19:04:29,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:04:29,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 24 minutes, 33 seconds)
2025-05-09 19:13:45,095 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:13:45,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:17:31,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4574.15039 ± 1075.454
2025-05-09 19:17:31,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5037.533, 4699.982, 5145.843, 2120.1326, 5300.9194, 2842.4553, 5331.121, 4872.6177, 5082.866, 5308.031]
2025-05-09 19:17:31,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:17:31,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 10 minutes, 26 seconds)
2025-05-09 19:26:46,534 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:26:46,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:30:38,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4552.51709 ± 1376.827
2025-05-09 19:30:38,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5053.17, 4498.8926, 521.49554, 4654.24, 5292.88, 5432.2246, 5442.6694, 4894.7764, 4950.048, 4784.7734]
2025-05-09 19:30:38,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:30:38,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 58 minutes, 29 seconds)
2025-05-09 19:39:54,189 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:39:54,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:43:43,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4395.57275 ± 1087.801
2025-05-09 19:43:43,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3626.1238, 1665.0739, 4836.192, 5320.9355, 5005.961, 4767.086, 5008.158, 3522.1157, 4892.266, 5311.817]
2025-05-09 19:43:43,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:43:43,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 45 minutes, 51 seconds)
2025-05-09 19:52:59,453 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:52:59,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:56:50,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4664.79639 ± 1638.943
2025-05-09 19:56:50,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5746.3677, 5061.8496, 5395.4653, 5441.7856, 5334.0522, 5403.16, 4925.2007, 4096.2383, 5329.825, -85.98179]
2025-05-09 19:56:50,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:56:50,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 33 minutes, 38 seconds)
2025-05-09 20:06:06,786 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:06:06,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:09:52,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5193.23779 ± 384.905
2025-05-09 20:09:52,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5473.517, 4903.72, 5154.102, 5907.623, 5226.0254, 5031.9346, 5181.4116, 4361.93, 5200.7163, 5491.4023]
2025-05-09 20:09:52,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:09:52,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (5193.24) for latency MM1Queue_a033_s075
2025-05-09 20:09:52,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 20:09:52,968 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 20:09:52,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 20 minutes, 3 seconds)
2025-05-09 20:19:08,353 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:19:09,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:22:54,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5064.11523 ± 449.181
2025-05-09 20:22:54,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4707.607, 5473.331, 4611.048, 5136.1978, 4926.6196, 5234.2217, 4251.3984, 5776.414, 4931.6606, 5592.6553]
2025-05-09 20:22:54,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:22:54,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 6 minutes, 55 seconds)
2025-05-09 20:32:10,006 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:32:10,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:35:57,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4885.06738 ± 1059.549
2025-05-09 20:35:57,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5390.2607, 3260.1553, 2465.2346, 5813.609, 5173.9854, 5480.5347, 4783.791, 5353.7153, 5668.0596, 5461.332]
2025-05-09 20:35:57,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:35:57,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 53 minutes, 5 seconds)
2025-05-09 20:45:12,373 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:45:12,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:49:01,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3371.26172 ± 1903.761
2025-05-09 20:49:01,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2370.5474, 1994.732, 5188.9937, 4775.3276, 5697.0703, 2240.358, 5333.6626, 4837.281, 171.5667, 1103.0785]
2025-05-09 20:49:01,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:49:01,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 39 minutes, 53 seconds)
2025-05-09 20:58:19,462 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:58:19,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:02:06,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3563.92773 ± 2152.974
2025-05-09 21:02:06,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3820.072, 5324.34, 572.0757, 5124.1597, 4768.7773, 4693.5615, 110.241806, 5798.4346, 5027.75, 399.86163]
2025-05-09 21:02:06,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:02:06,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 26 minutes, 31 seconds)
2025-05-09 21:11:20,765 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:11:21,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:15:05,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5178.41357 ± 213.505
2025-05-09 21:15:05,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5374.8105, 5398.0713, 5235.981, 5168.024, 5321.4355, 5021.586, 4761.598, 5008.714, 5013.9146, 5480.0024]
2025-05-09 21:15:05,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:15:05,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 13 minutes, 1 second)
2025-05-09 21:24:20,395 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:24:20,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:28:09,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4923.01074 ± 983.719
2025-05-09 21:28:09,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5157.7744, 4570.4263, 5683.085, 5580.6045, 4825.2437, 5317.5093, 5150.9663, 2151.1543, 5133.5947, 5659.747]
2025-05-09 21:28:09,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:28:09,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 15 seconds)
2025-05-09 21:37:28,207 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:37:28,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:41:12,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5222.37012 ± 800.445
2025-05-09 21:41:12,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5529.5264, 5655.9556, 5279.0815, 5732.418, 3282.6604, 6043.9287, 4268.8438, 5220.137, 5273.0977, 5938.046]
2025-05-09 21:41:12,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:41:12,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (5222.37) for latency MM1Queue_a033_s075
2025-05-09 21:41:12,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 21:41:12,813 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:41:12,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 47 minutes, 21 seconds)
2025-05-09 21:50:29,858 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:50:29,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:54:19,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4545.96875 ± 1673.795
2025-05-09 21:54:19,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5766.593, 5756.068, 2144.5283, 5203.9824, 4679.153, 2046.1477, 5784.8516, 1986.3472, 5982.1157, 6109.8965]
2025-05-09 21:54:19,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:54:19,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 34 minutes, 41 seconds)
2025-05-09 22:03:35,657 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:03:35,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:07:22,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5184.36182 ± 705.047
2025-05-09 22:07:22,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5020.254, 5452.5938, 5344.7515, 5658.6997, 5719.7554, 5129.7793, 5310.304, 5241.591, 5773.021, 3192.873]
2025-05-09 22:07:22,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:07:22,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 21 minutes, 20 seconds)
2025-05-09 22:16:40,676 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:16:40,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:20:24,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5065.71924 ± 726.531
2025-05-09 22:20:24,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3328.9631, 5282.251, 5187.6597, 5342.681, 5821.372, 5229.8745, 4203.2007, 5857.786, 5444.0264, 4959.3745]
2025-05-09 22:20:24,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:20:24,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 8 minutes, 33 seconds)
2025-05-09 22:29:39,530 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:29:39,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:33:28,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4486.16846 ± 1675.784
2025-05-09 22:33:28,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5242.6953, 5426.1074, 5712.94, 5550.4165, -32.823486, 5633.6323, 4496.1562, 3091.0999, 4803.454, 4938.004]
2025-05-09 22:33:28,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:33:28,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 55 minutes, 38 seconds)
2025-05-09 22:42:45,901 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:42:45,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:46:31,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4435.96924 ± 1522.699
2025-05-09 22:46:31,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2132.7903, 1103.8157, 5391.3467, 5141.4893, 5237.063, 5663.518, 3684.1296, 5041.2344, 5260.2017, 5704.1006]
2025-05-09 22:46:31,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:46:31,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 42 minutes, 32 seconds)
2025-05-09 22:55:48,970 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:55:48,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:59:33,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4028.77734 ± 1685.723
2025-05-09 22:59:33,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2403.1636, 1741.1473, 5239.2207, 5333.525, 884.51526, 5126.7305, 4974.059, 3437.3303, 5936.176, 5211.9067]
2025-05-09 22:59:33,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:59:33,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 28 minutes, 49 seconds)
2025-05-09 23:08:49,314 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:08:49,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:12:37,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4311.23633 ± 1515.829
2025-05-09 23:12:37,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5303.5747, 5340.033, 5810.983, 2181.6846, 5059.5776, 5093.489, 1839.6375, 4929.38, 5496.934, 2057.0696]
2025-05-09 23:12:37,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:12:37,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 15 minutes, 51 seconds)
2025-05-09 23:21:51,861 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:21:51,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:25:36,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4680.29053 ± 1716.999
2025-05-09 23:25:36,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5419.75, 5670.364, -108.50415, 5366.618, 5540.33, 5439.334, 5948.456, 3515.7485, 5176.5845, 4834.222]
2025-05-09 23:25:36,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:25:36,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 2 minutes, 30 seconds)
2025-05-09 23:34:51,235 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:34:51,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:38:37,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5420.72754 ± 283.010
2025-05-09 23:38:37,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5464.813, 5730.144, 5050.6743, 5557.0366, 5546.666, 5962.7637, 5342.47, 5086.382, 5075.3555, 5390.97]
2025-05-09 23:38:37,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:38:37,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (5420.73) for latency MM1Queue_a033_s075
2025-05-09 23:38:37,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 23:38:37,452 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:38:37,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 49 minutes, 7 seconds)
2025-05-09 23:47:51,723 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:47:51,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:51:34,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5548.35059 ± 450.648
2025-05-09 23:51:34,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5225.188, 5209.9375, 5868.7, 4725.0933, 6001.042, 6057.124, 5381.009, 5553.789, 6217.178, 5244.444]
2025-05-09 23:51:34,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:51:34,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (5548.35) for latency MM1Queue_a033_s075
2025-05-09 23:51:34,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 23:51:34,409 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 23:51:34,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 35 minutes, 17 seconds)
2025-05-10 00:00:49,086 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:00:49,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:04:37,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5655.60693 ± 406.680
2025-05-10 00:04:37,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5832.9717, 5347.451, 5947.158, 5166.281, 5165.1123, 6056.1157, 6063.7983, 5223.742, 6295.7188, 5457.719]
2025-05-10 00:04:37,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:04:37,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (5655.61) for latency MM1Queue_a033_s075
2025-05-10 00:04:37,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 00:04:37,868 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 00:04:37,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 22 minutes, 29 seconds)
2025-05-10 00:13:53,033 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:13:53,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:17:38,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5105.14941 ± 963.241
2025-05-10 00:17:38,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6114.7485, 2606.18, 5301.582, 6127.2603, 4527.8105, 4809.1206, 5500.784, 5687.3457, 5170.625, 5206.0376]
2025-05-10 00:17:38,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:17:38,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 9 minutes, 3 seconds)
2025-05-10 00:26:53,191 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:26:53,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:30:42,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5552.49121 ± 353.922
2025-05-10 00:30:43,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5724.6963, 5664.401, 6139.8267, 5630.82, 5245.702, 5416.7456, 5488.0923, 6026.6533, 5314.6133, 4873.3594]
2025-05-10 00:30:43,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:30:43,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 56 minutes, 44 seconds)
2025-05-10 00:39:59,601 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:39:59,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:43:50,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5569.35303 ± 942.140
2025-05-10 00:43:50,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5599.348, 6146.335, 5516.5146, 6506.5376, 5666.5034, 5718.819, 5967.064, 2950.9126, 6310.069, 5311.4272]
2025-05-10 00:43:50,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:43:50,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 44 minutes, 19 seconds)
2025-05-10 00:53:06,525 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:53:06,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:56:58,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4323.60156 ± 2454.802
2025-05-10 00:56:58,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1450.421, -24.14964, 6329.089, 5407.4795, 5837.785, 5723.631, 5473.309, 520.80206, 6694.4434, 5823.204]
2025-05-10 00:56:58,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:56:58,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 32 minutes, 23 seconds)
2025-05-10 01:06:14,034 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:06:14,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:10:02,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5094.08691 ± 1837.650
2025-05-10 01:10:02,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5978.8, 5388.425, 5939.049, 4187.614, 5288.6475, 6066.986, 5819.981, -128.02109, 6462.492, 5936.897]
2025-05-10 01:10:02,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:10:02,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 19 minutes, 20 seconds)
2025-05-10 01:19:17,504 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:19:17,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:23:01,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5052.22803 ± 1318.386
2025-05-10 01:23:01,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5498.4365, 5101.5396, 5641.1016, 5148.0596, 5391.5483, 1627.5138, 6387.953, 5743.3013, 3827.1597, 6155.6675]
2025-05-10 01:23:01,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:23:01,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 6 minutes, 13 seconds)
2025-05-10 01:32:17,417 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:32:17,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:36:07,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5448.07617 ± 754.595
2025-05-10 01:36:07,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5908.3887, 4887.1704, 3627.9766, 5234.9517, 5769.777, 6224.27, 5271.1626, 5574.016, 6474.0396, 5509.013]
2025-05-10 01:36:07,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:36:07,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 53 minutes, 7 seconds)
2025-05-10 01:45:23,521 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:45:23,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:49:07,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5141.92773 ± 1490.282
2025-05-10 01:49:08,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5702.3853, 5233.0034, 5989.0166, 6066.9756, 6826.059, 1979.1501, 6461.135, 5806.78, 2969.518, 4385.253]
2025-05-10 01:49:08,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:49:08,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 39 minutes, 29 seconds)
2025-05-10 01:58:23,819 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:58:23,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:02:10,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4869.97949 ± 1400.742
2025-05-10 02:02:10,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5477.2466, 1525.2694, 6108.0825, 5106.461, 3122.6238, 4816.121, 5063.8813, 5229.171, 6349.982, 5900.9595]
2025-05-10 02:02:10,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:02:10,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 26 minutes, 1 second)
2025-05-10 02:11:26,748 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:11:26,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:15:16,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5365.02734 ± 1595.657
2025-05-10 02:15:17,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5547.625, 6119.8105, 5416.575, 6182.702, 6622.8774, 5785.143, 5482.8296, 6064.6836, 5731.0625, 696.95966]
2025-05-10 02:15:17,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:15:17,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 13 minutes, 15 seconds)
2025-05-10 02:24:35,209 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:24:35,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:28:18,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5558.11768 ± 927.110
2025-05-10 02:28:18,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5700.868, 5410.9126, 6158.738, 2904.8862, 5664.1885, 5531.249, 6398.456, 5962.8315, 5959.5625, 5889.479]
2025-05-10 02:28:18,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:28:18,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 15 seconds)
2025-05-10 02:37:34,409 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:37:34,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:41:24,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5816.32959 ± 537.637
2025-05-10 02:41:24,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6383.965, 5891.387, 5248.4985, 5536.061, 6512.977, 5937.392, 6139.082, 4623.554, 6170.633, 5719.75]
2025-05-10 02:41:24,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:41:24,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (5816.33) for latency MM1Queue_a033_s075
2025-05-10 02:41:24,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 02:41:24,847 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 02:41:24,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 47 minutes, 18 seconds)
2025-05-10 02:50:42,193 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:50:42,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:54:29,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5898.36377 ± 409.101
2025-05-10 02:54:29,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5512.7363, 5479.1685, 5514.89, 6189.342, 5460.973, 5608.2095, 6559.346, 5983.0806, 6262.2983, 6413.5947]
2025-05-10 02:54:29,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:54:29,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1226 [INFO]: New best (5898.36) for latency MM1Queue_a033_s075
2025-05-10 02:54:29,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-10 02:54:29,568 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-mbpac_memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 02:54:29,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 34 minutes, 30 seconds)
2025-05-10 03:03:44,493 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:03:44,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:07:29,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5319.71191 ± 1548.833
2025-05-10 03:07:29,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [797.07245, 5604.8804, 5749.8735, 6122.495, 5502.2246, 5549.002, 5433.976, 6709.8755, 5869.5825, 5858.1357]
2025-05-10 03:07:29,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:07:29,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 21 minutes, 15 seconds)
2025-05-10 03:16:44,393 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:16:44,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:20:30,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5336.69092 ± 499.550
2025-05-10 03:20:31,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5583.438, 5502.933, 5452.476, 6004.202, 4110.8237, 5812.9126, 5071.422, 5017.5273, 5541.1196, 5270.0547]
2025-05-10 03:20:31,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:20:31,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 7 minutes, 51 seconds)
2025-05-10 03:29:47,533 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:29:47,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:33:38,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4667.41455 ± 1727.630
2025-05-10 03:33:38,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6338.3887, 5523.7676, 6091.6167, 5614.5513, 2384.437, 5495.9727, 2680.9507, 5577.5317, 1256.3777, 5710.5464]
2025-05-10 03:33:38,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:33:38,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 55 minutes, 11 seconds)
2025-05-10 03:42:53,343 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:42:53,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:46:45,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4512.17139 ± 1902.199
2025-05-10 03:46:45,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5700.8154, 4743.4683, 4532.428, 1925.5343, 5374.9824, 5736.7534, 5583.1157, -98.44265, 5792.03, 5831.029]
2025-05-10 03:46:45,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:46:45,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 42 minutes, 9 seconds)
2025-05-10 03:56:00,263 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:56:00,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:59:44,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5579.30225 ± 473.479
2025-05-10 03:59:44,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5703.0654, 5275.289, 5852.262, 5846.907, 5436.4795, 6253.2886, 4934.595, 4942.891, 5218.8154, 6329.4277]
2025-05-10 03:59:44,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:59:44,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 28 minutes, 49 seconds)
2025-05-10 04:08:58,642 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:08:58,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:12:46,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5463.21387 ± 1746.833
2025-05-10 04:12:46,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [352.6895, 5681.1763, 6789.7925, 5900.5835, 6497.263, 5400.1416, 5947.7827, 5802.241, 5923.6177, 6336.846]
2025-05-10 04:12:46,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:12:46,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 15 minutes, 52 seconds)
2025-05-10 04:22:02,438 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:22:02,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:25:52,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5348.19434 ± 1221.864
2025-05-10 04:25:53,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6060.4556, 6009.038, 1898.3055, 5564.9946, 5309.286, 6018.986, 5765.081, 5367.6445, 6484.9043, 5003.246]
2025-05-10 04:25:53,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:25:53,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 3 minutes)
2025-05-10 04:35:09,859 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:35:09,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:38:58,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5483.47705 ± 1034.282
2025-05-10 04:38:58,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6329.3467, 5746.677, 5655.7544, 2470.611, 5725.272, 6065.176, 5481.874, 6070.156, 5723.7134, 5566.1875]
2025-05-10 04:38:58,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:38:58,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 49 minutes, 51 seconds)
2025-05-10 04:48:12,111 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:48:12,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:51:56,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5153.48828 ± 1517.026
2025-05-10 04:51:56,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2190.9417, 5858.9854, 5999.807, 5779.402, 6027.7383, 5968.9624, 5922.7183, 5825.1235, 2056.0513, 5905.1533]
2025-05-10 04:51:56,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:51:56,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 36 minutes, 27 seconds)
2025-05-10 05:01:10,933 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:01:11,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:05:00,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5779.67871 ± 323.211
2025-05-10 05:05:00,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5931.7217, 6396.7427, 5834.8813, 5528.453, 5725.4155, 5485.098, 6285.203, 5369.159, 5549.886, 5690.2256]
2025-05-10 05:05:00,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:05:00,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 23 minutes, 34 seconds)
2025-05-10 05:14:17,384 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:14:17,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:18:09,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5678.68262 ± 1369.735
2025-05-10 05:18:09,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5716.8726, 6174.6753, 6109.459, 5810.2446, 6352.496, 6190.099, 1666.3904, 6609.942, 5684.1978, 6472.445]
2025-05-10 05:18:09,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:18:09,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 10 minutes, 45 seconds)
2025-05-10 05:27:25,573 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:27:25,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:31:08,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5283.37012 ± 1199.556
2025-05-10 05:31:08,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5775.305, 4343.8403, 5379.3013, 6311.684, 5732.6855, 5971.105, 5518.8306, 6116.867, 2022.9615, 5661.1245]
2025-05-10 05:31:08,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:31:08,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 57 minutes, 28 seconds)
2025-05-10 05:40:25,199 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:40:25,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:44:13,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4214.06445 ± 2067.928
2025-05-10 05:44:14,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-112.24025, 5680.011, 5687.355, 4849.427, 5058.5615, 4484.825, 5045.533, 415.99078, 5641.9844, 5389.199]
2025-05-10 05:44:14,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:44:14,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 44 minutes, 25 seconds)
2025-05-10 05:53:30,566 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:53:30,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:57:15,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5498.85547 ± 1206.961
2025-05-10 05:57:15,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5527.3877, 6258.5913, 2016.6069, 5260.865, 5875.574, 5959.139, 6120.225, 5535.4585, 6346.096, 6088.6094]
2025-05-10 05:57:15,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:57:15,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 31 minutes, 25 seconds)
2025-05-10 06:06:30,057 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 06:06:30,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:10:15,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5436.45605 ± 1100.720
2025-05-10 06:10:15,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6035.1914, 5633.579, 5796.098, 6102.6685, 3437.3167, 6278.7827, 5717.009, 6343.7407, 5892.308, 3127.8674]
2025-05-10 06:10:15,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:10:15,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 18 minutes, 17 seconds)
2025-05-10 06:19:31,576 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 06:19:32,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:23:15,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5041.58545 ± 1585.373
2025-05-10 06:23:16,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4973.505, 5370.58, 6099.223, 4972.697, 6626.128, 3232.904, 5766.1196, 6219.318, 6018.774, 1136.6077]
2025-05-10 06:23:16,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:23:16,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 5 minutes, 6 seconds)
2025-05-10 06:32:26,614 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 06:32:26,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:36:09,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5568.53320 ± 1500.714
2025-05-10 06:36:09,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6044.227, 6432.5303, 5952.098, 5765.028, 6454.179, 5880.6665, 6181.6685, 1124.1112, 6146.3027, 5704.519]
2025-05-10 06:36:09,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:36:09,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 52 minutes)
2025-05-10 06:45:18,069 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 06:45:18,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:49:01,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5525.30811 ± 650.520
2025-05-10 06:49:01,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6067.151, 5373.755, 5545.6094, 5641.587, 5446.7197, 4022.8, 6317.6304, 5818.1426, 6189.3853, 4830.3]
2025-05-10 06:49:01,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:49:01,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 38 minutes, 52 seconds)
2025-05-10 06:58:09,069 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 06:58:09,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:01:52,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4959.12207 ± 1429.117
2025-05-10 07:01:52,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5758.558, 6052.6787, 1810.4308, 5868.2427, 5785.289, 2550.6042, 4952.9756, 5522.645, 5875.4893, 5414.3057]
2025-05-10 07:01:52,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:01:52,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes, 50 seconds)
2025-05-10 07:11:02,163 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 07:11:02,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:14:46,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5450.88379 ± 1510.574
2025-05-10 07:14:46,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6150.599, 5682.352, 1075.7944, 5757.017, 6047.2583, 6607.6865, 6222.988, 6171.7144, 5061.6367, 5731.7915]
2025-05-10 07:14:46,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:14:46,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 54 seconds)
2025-05-10 07:23:56,078 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 07:23:56,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 07:27:37,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4495.71973 ± 1991.599
2025-05-10 07:27:38,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6060.488, 5347.614, 1209.6954, 5343.273, 3961.0486, 219.4224, 6035.9487, 6118.6494, 5358.1836, 5302.876]
2025-05-10 07:27:38,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 07:27:38,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1251 [DEBUG]: Training session finished
