2025-05-13 09:06:37,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mda-mem16
2025-05-13 09:06:37,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mda-mem16
2025-05-13 09:06:37,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14efb9269210>}
2025-05-13 09:06:37,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:37,863 baseline-bpql-mda-noisy-halfcheetah:91 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-13 09:06:37,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-13 09:06:37,879 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:37,880 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:37,885 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:38,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:38,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:32,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:10:49,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -329.56561 ± 8.279
2025-05-13 09:10:49,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-334.7277, -325.7555, -328.29483, -333.9693, -340.6329, -318.18173, -329.10825, -325.7404, -342.97165, -316.27383]
2025-05-13 09:10:49,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:10:49,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-329.57) for latency MM1Queue_a033_s075
2025-05-13 09:10:49,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 53 minutes, 12 seconds)
2025-05-13 09:14:47,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:15:03,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -178.43292 ± 48.780
2025-05-13 09:15:03,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-289.65616, -169.82602, -140.46864, -108.34302, -121.24509, -205.21999, -184.67274, -175.79918, -209.42508, -179.67331]
2025-05-13 09:15:03,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:15:03,944 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-178.43) for latency MM1Queue_a033_s075
2025-05-13 09:15:03,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 52 minutes, 40 seconds)
2025-05-13 09:19:02,367 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:19:19,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 6.54119 ± 165.251
2025-05-13 09:19:19,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-42.973087, 501.16376, -54.706657, -63.84026, -38.93074, -42.533596, -29.675121, -39.165295, -59.77681, -64.15024]
2025-05-13 09:19:19,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:19:19,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (6.54) for latency MM1Queue_a033_s075
2025-05-13 09:19:19,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 49 minutes, 46 seconds)
2025-05-13 09:23:17,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:23:33,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 787.55267 ± 68.283
2025-05-13 09:23:33,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [796.54504, 743.4253, 810.58984, 851.72253, 710.1437, 669.4737, 913.9506, 751.33575, 839.761, 788.5798]
2025-05-13 09:23:33,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:23:33,974 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (787.55) for latency MM1Queue_a033_s075
2025-05-13 09:23:33,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 46 minutes, 8 seconds)
2025-05-13 09:27:32,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:27:48,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1325.02820 ± 520.632
2025-05-13 09:27:48,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2496.5732, 911.8009, 1026.0657, 912.1117, 865.97485, 1544.6027, 1163.5483, 1598.9686, 1886.9093, 843.72687]
2025-05-13 09:27:48,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:27:48,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1325.03) for latency MM1Queue_a033_s075
2025-05-13 09:27:48,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 42 minutes, 10 seconds)
2025-05-13 09:31:46,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:32:02,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1459.24048 ± 779.916
2025-05-13 09:32:02,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [955.28156, 2449.1812, 1005.42786, 1187.8583, 880.5297, 980.36816, 836.59827, 837.5602, 2762.2344, 2697.3655]
2025-05-13 09:32:02,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:32:02,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1459.24) for latency MM1Queue_a033_s075
2025-05-13 09:32:02,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 39 minutes, 8 seconds)
2025-05-13 09:36:01,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:36:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2500.82178 ± 695.568
2025-05-13 09:36:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2244.6992, 1338.2797, 1951.3848, 1997.8218, 3227.3535, 3148.6243, 2645.5513, 1787.6266, 3369.2683, 3297.6086]
2025-05-13 09:36:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:36:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2500.82) for latency MM1Queue_a033_s075
2025-05-13 09:36:18,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 35 minutes, 2 seconds)
2025-05-13 09:40:16,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:40:32,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1828.25232 ± 433.532
2025-05-13 09:40:32,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2237.6272, 1593.4497, 1955.0548, 1794.0447, 2133.2964, 1868.2476, 1412.3402, 1113.4744, 1484.7579, 2690.2295]
2025-05-13 09:40:32,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:40:32,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 30 minutes, 39 seconds)
2025-05-13 09:44:31,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:44:47,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1900.12671 ± 1113.379
2025-05-13 09:44:47,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1909.652, 3366.7043, 2600.935, 1173.9532, 2479.4338, 1377.5247, 3723.5625, 1522.7263, -204.76894, 1051.5441]
2025-05-13 09:44:47,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:44:47,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 26 minutes, 26 seconds)
2025-05-13 09:48:44,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:49:01,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2029.64575 ± 1292.611
2025-05-13 09:49:01,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1356.9619, 3452.3354, 1418.9214, 4272.719, 4101.8564, 1061.0514, 1071.8618, 1260.3479, 630.9356, 1669.466]
2025-05-13 09:49:01,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:49:01,337 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 21 minutes, 48 seconds)
2025-05-13 09:52:57,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:53:13,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2565.09180 ± 1101.771
2025-05-13 09:53:13,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1550.6304, 2628.6094, 1790.4304, 3727.4426, 3030.51, 1439.9407, 1489.5002, 4076.9294, 4358.4717, 1558.4524]
2025-05-13 09:53:13,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:53:13,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2565.09) for latency MM1Queue_a033_s075
2025-05-13 09:53:13,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 16 minutes, 58 seconds)
2025-05-13 09:57:09,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 09:57:25,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2217.75244 ± 1034.011
2025-05-13 09:57:25,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1369.8157, 1292.2123, 3512.8643, 1496.257, 1430.3912, 1260.594, 3989.8057, 1881.3676, 2268.2725, 3675.946]
2025-05-13 09:57:25,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:57:25,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 11 minutes, 40 seconds)
2025-05-13 10:01:20,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:01:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2110.17578 ± 955.337
2025-05-13 10:01:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1186.1014, 2498.3982, 3807.8198, 1320.2877, 2475.2334, 1291.6842, 3668.059, 1676.1847, 1017.0089, 2160.9788]
2025-05-13 10:01:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:01:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 6 minutes, 34 seconds)
2025-05-13 10:05:32,790 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:05:48,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2128.05127 ± 781.165
2025-05-13 10:05:48,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1701.1257, 2151.5854, 3994.7183, 1982.8582, 1342.353, 1435.7548, 3140.3313, 1820.9421, 1697.9988, 2012.845]
2025-05-13 10:05:48,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:05:48,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 1 minute, 27 seconds)
2025-05-13 10:09:43,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:09:59,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3352.75854 ± 994.772
2025-05-13 10:09:59,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4207.1494, 3699.8826, 1120.7958, 3757.324, 4059.7468, 3778.965, 3119.0256, 4028.0972, 3940.7217, 1815.8762]
2025-05-13 10:09:59,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:09:59,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3352.76) for latency MM1Queue_a033_s075
2025-05-13 10:09:59,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 56 minutes, 35 seconds)
2025-05-13 10:13:55,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:14:11,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3133.94922 ± 1041.361
2025-05-13 10:14:11,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3552.5786, 3647.746, 4103.1997, 3757.479, 3609.467, 1096.5308, 3483.22, 1064.1752, 3453.2007, 3571.895]
2025-05-13 10:14:11,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:14:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 52 minutes, 10 seconds)
2025-05-13 10:18:06,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:18:22,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2058.53955 ± 878.651
2025-05-13 10:18:22,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1527.1833, 1803.262, 2925.189, 1854.649, 1977.6243, 1211.0148, 4279.897, 2217.011, 1309.3741, 1480.1923]
2025-05-13 10:18:22,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:18:22,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 47 minutes, 45 seconds)
2025-05-13 10:22:17,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:22:34,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2596.98926 ± 1121.115
2025-05-13 10:22:34,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3925.1208, 1361.9578, 4036.4622, 4311.2583, 2631.4358, 1192.0112, 1556.8174, 2958.7896, 1571.7448, 2424.2964]
2025-05-13 10:22:34,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:22:34,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 43 minutes, 35 seconds)
2025-05-13 10:26:28,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:26:45,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3420.57227 ± 874.520
2025-05-13 10:26:45,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3815.6072, 3674.0745, 3998.9219, 3762.4597, 3842.7603, 2686.415, 1049.4583, 3426.174, 4080.038, 3869.8108]
2025-05-13 10:26:45,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:26:45,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3420.57) for latency MM1Queue_a033_s075
2025-05-13 10:26:45,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 39 minutes, 12 seconds)
2025-05-13 10:30:40,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:30:56,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3567.08203 ± 854.202
2025-05-13 10:30:56,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4208.2456, 4160.949, 1517.5872, 4256.11, 4339.2974, 2958.4436, 4149.1396, 3889.3616, 3141.158, 3050.5269]
2025-05-13 10:30:56,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:30:56,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3567.08) for latency MM1Queue_a033_s075
2025-05-13 10:30:56,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 35 minutes, 4 seconds)
2025-05-13 10:34:51,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:35:07,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2372.82959 ± 875.989
2025-05-13 10:35:07,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1244.5408, 3072.8967, 2489.0266, 2303.4797, 3648.7476, 1479.8868, 1760.3479, 3172.8916, 1175.1354, 3381.3435]
2025-05-13 10:35:07,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:35:07,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 30 minutes, 45 seconds)
2025-05-13 10:39:02,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:39:18,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3149.75146 ± 1214.691
2025-05-13 10:39:18,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3899.9731, 4288.9985, 1306.288, 3598.9446, 4268.565, 4233.693, 3108.8647, 1697.823, 3966.9153, 1127.4496]
2025-05-13 10:39:18,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:39:18,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 26 minutes, 38 seconds)
2025-05-13 10:43:13,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:43:29,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2833.77637 ± 1342.258
2025-05-13 10:43:29,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4145.7314, 1047.2365, 3584.701, 4091.2256, 4540.8438, 4152.1143, 1753.3865, 2526.4338, 1159.554, 1336.5369]
2025-05-13 10:43:29,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:43:29,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 22 minutes, 19 seconds)
2025-05-13 10:47:24,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:47:40,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3282.10815 ± 1221.296
2025-05-13 10:47:40,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3986.7158, 2801.225, 4010.8271, 4241.9653, 4412.4546, 1999.5253, 1626.1705, 4217.5044, 1080.76, 4443.932]
2025-05-13 10:47:40,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:47:41,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 18 minutes, 8 seconds)
2025-05-13 10:51:35,765 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:51:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2999.19092 ± 1041.131
2025-05-13 10:51:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2975.973, 4351.3228, 1927.2867, 2186.3635, 2437.5696, 3989.1187, 2363.3079, 1410.1233, 4498.8184, 3852.0244]
2025-05-13 10:51:51,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:51:51,706 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 13 minutes, 49 seconds)
2025-05-13 10:55:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 10:56:02,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2693.58032 ± 1253.526
2025-05-13 10:56:02,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [995.64044, 1699.4335, 4327.8896, 1734.1824, 1148.8593, 4003.6333, 4205.628, 3432.4817, 1882.8988, 3505.157]
2025-05-13 10:56:02,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:56:02,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 9 minutes, 36 seconds)
2025-05-13 10:59:57,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:00:13,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2953.71509 ± 865.812
2025-05-13 11:00:13,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2336.8047, 4266.98, 3432.7522, 1731.4529, 1708.2218, 4158.928, 2338.1074, 3459.1484, 3122.957, 2981.7974]
2025-05-13 11:00:13,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:00:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 5 minutes, 20 seconds)
2025-05-13 11:04:08,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:04:24,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3232.69360 ± 1017.217
2025-05-13 11:04:24,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4031.0889, 1580.7446, 4330.027, 3788.0928, 2263.4146, 4025.8186, 1547.1295, 3973.2764, 3872.3972, 2914.9478]
2025-05-13 11:04:24,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:04:24,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 1 minute, 11 seconds)
2025-05-13 11:08:19,972 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:08:36,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3529.73706 ± 899.066
2025-05-13 11:08:36,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2756.6948, 1670.8397, 3895.0403, 3352.9963, 4141.209, 2414.6455, 4074.8357, 4197.581, 4423.9146, 4369.6123]
2025-05-13 11:08:36,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:08:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 57 minutes, 6 seconds)
2025-05-13 11:12:31,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:12:47,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3272.89014 ± 830.003
2025-05-13 11:12:47,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1945.8156, 3812.787, 3719.4492, 2579.0635, 4499.602, 4154.479, 3685.249, 2710.1504, 3504.1455, 2118.161]
2025-05-13 11:12:47,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:12:47,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 53 minutes, 7 seconds)
2025-05-13 11:16:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:16:59,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2663.83057 ± 1293.391
2025-05-13 11:16:59,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4307.054, 1724.5165, 4071.2742, 1423.3995, 1263.213, 1652.9746, 4080.5217, 1447.4823, 2271.2988, 4396.572]
2025-05-13 11:16:59,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:16:59,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 49 minutes, 1 second)
2025-05-13 11:20:54,700 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:21:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2570.19482 ± 1318.422
2025-05-13 11:21:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4178.429, 1749.1044, 1230.0468, 4225.601, 4318.292, 2043.8745, 1289.5939, 3903.2197, 1457.7645, 1306.0209]
2025-05-13 11:21:10,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:21:10,646 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 44 minutes, 57 seconds)
2025-05-13 11:25:06,002 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:25:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3420.84961 ± 896.148
2025-05-13 11:25:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1186.7292, 4336.9854, 2859.6357, 4343.8325, 4061.0198, 3111.1433, 4052.9844, 3319.9895, 3229.9897, 3706.1853]
2025-05-13 11:25:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:25:22,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 40 minutes, 46 seconds)
2025-05-13 11:29:17,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:29:33,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2487.13672 ± 937.763
2025-05-13 11:29:33,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3692.4216, 1696.0217, 3172.215, 2366.8142, 4456.9517, 1376.7954, 2180.368, 1897.2391, 1674.9006, 2357.6416]
2025-05-13 11:29:33,544 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:29:33,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 36 minutes, 34 seconds)
2025-05-13 11:33:28,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:33:44,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2589.11035 ± 1245.460
2025-05-13 11:33:44,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1540.8658, 1509.3429, 4244.015, 4441.3374, 2691.5222, 4475.062, 2457.3176, 1541.8905, 1522.0951, 1467.6534]
2025-05-13 11:33:44,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:33:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 32 minutes, 21 seconds)
2025-05-13 11:37:40,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:37:56,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3046.06909 ± 1345.284
2025-05-13 11:37:56,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1659.2921, 1590.1726, 4029.5793, 1218.9077, 4194.9883, 4003.5583, 4415.967, 3981.462, 1183.7438, 4183.0176]
2025-05-13 11:37:56,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:37:56,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 28 minutes, 11 seconds)
2025-05-13 11:41:51,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:42:07,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3422.32666 ± 901.856
2025-05-13 11:42:07,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4078.1218, 2731.6863, 2399.0854, 2582.8306, 4290.512, 4360.8643, 2090.7598, 4220.2637, 2948.6785, 4520.463]
2025-05-13 11:42:07,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:42:07,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 23 minutes, 56 seconds)
2025-05-13 11:46:02,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:46:19,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2675.26489 ± 1126.791
2025-05-13 11:46:19,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1408.7667, 3962.803, 3230.5964, 1452.1759, 3551.2505, 2422.4316, 1653.2673, 1158.8938, 3661.6301, 4250.831]
2025-05-13 11:46:19,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:46:19,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 19 minutes, 48 seconds)
2025-05-13 11:50:14,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:50:30,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2683.41455 ± 1108.633
2025-05-13 11:50:30,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2849.947, 2687.727, 1173.8997, 1784.8337, 2820.7393, 1200.9531, 4591.7744, 2255.285, 3054.1873, 4414.7974]
2025-05-13 11:50:30,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:50:30,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 15 minutes, 37 seconds)
2025-05-13 11:54:25,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:54:41,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2581.23730 ± 1357.339
2025-05-13 11:54:41,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4443.901, 2247.2148, 1513.4468, 4311.696, 4190.101, 1161.1584, 1226.6902, 1197.6604, 1690.2765, 3830.2283]
2025-05-13 11:54:41,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:54:41,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 11 minutes, 21 seconds)
2025-05-13 11:58:37,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 11:58:53,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3146.45703 ± 1249.499
2025-05-13 11:58:53,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4297.1167, 4396.065, 4547.9414, 2346.355, 2741.8254, 1206.1215, 4015.413, 1577.703, 1947.9762, 4388.052]
2025-05-13 11:58:53,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:58:53,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 7 minutes, 13 seconds)
2025-05-13 12:02:48,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:03:04,746 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2923.41895 ± 1017.501
2025-05-13 12:03:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1843.2775, 1863.8516, 2940.9175, 4523.084, 2560.2817, 1783.1821, 4126.0796, 3485.3835, 4120.8633, 1987.2671]
2025-05-13 12:03:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:03:04,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 3 minutes, 3 seconds)
2025-05-13 12:06:59,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:07:15,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3760.15039 ± 768.412
2025-05-13 12:07:15,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4139.6333, 4136.166, 3076.4146, 4015.4946, 4033.9338, 4253.967, 4554.562, 4084.3496, 3542.8992, 1764.0847]
2025-05-13 12:07:15,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:07:15,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3760.15) for latency MM1Queue_a033_s075
2025-05-13 12:07:15,993 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 58 minutes, 47 seconds)
2025-05-13 12:11:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:11:27,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2967.67847 ± 1025.398
2025-05-13 12:11:27,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2267.701, 4079.4297, 1730.4895, 2835.913, 3479.7566, 3571.3203, 1456.834, 4325.548, 4080.1013, 1849.6924]
2025-05-13 12:11:27,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:11:27,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 54 minutes, 33 seconds)
2025-05-13 12:15:22,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:15:38,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3099.13770 ± 1118.311
2025-05-13 12:15:38,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4368.5366, 4480.675, 2214.8477, 3306.6348, 4070.812, 2955.8665, 1659.3694, 2322.882, 1331.2625, 4280.4917]
2025-05-13 12:15:38,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:15:38,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 50 minutes, 24 seconds)
2025-05-13 12:19:33,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:19:49,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3158.27588 ± 1290.491
2025-05-13 12:19:49,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4616.727, 2969.7012, 1457.0304, 1249.7974, 3466.1455, 4088.5208, 4083.6333, 4394.5586, 1205.6552, 4050.993]
2025-05-13 12:19:49,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:19:49,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 46 minutes, 9 seconds)
2025-05-13 12:23:45,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:24:00,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3013.63306 ± 1101.868
2025-05-13 12:24:00,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3925.5857, 4423.5806, 3576.1667, 1556.3324, 3384.346, 3125.7207, 1607.157, 4457.645, 1418.7316, 2661.0645]
2025-05-13 12:24:00,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:24:01,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 41 minutes, 56 seconds)
2025-05-13 12:27:55,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:28:12,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3086.66797 ± 1038.773
2025-05-13 12:28:12,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2760.635, 4326.6245, 2974.382, 1861.8496, 4463.5635, 4463.556, 1707.8815, 2904.4158, 1804.1498, 3599.624]
2025-05-13 12:28:12,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:28:12,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 37 minutes, 45 seconds)
2025-05-13 12:32:07,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:32:23,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3415.55664 ± 1111.826
2025-05-13 12:32:23,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1423.1943, 2954.6477, 4473.9062, 4211.502, 4010.3271, 2716.3022, 4128.7544, 1573.4119, 4298.0493, 4365.473]
2025-05-13 12:32:23,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:32:23,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 33 minutes, 35 seconds)
2025-05-13 12:36:19,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:36:35,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2637.74146 ± 1059.087
2025-05-13 12:36:35,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1451.0183, 1736.1029, 3031.11, 2440.2698, 2435.6565, 1929.5397, 4292.9785, 4575.031, 1390.068, 3095.6396]
2025-05-13 12:36:35,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:36:35,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 29 minutes, 26 seconds)
2025-05-13 12:40:30,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:40:46,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2511.10767 ± 756.031
2025-05-13 12:40:46,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2872.9507, 1308.4862, 2504.2043, 3690.189, 2892.5671, 1518.6304, 2774.9497, 3321.8484, 2649.7712, 1577.4807]
2025-05-13 12:40:46,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:40:46,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 25 minutes, 18 seconds)
2025-05-13 12:44:42,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:44:58,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3173.85303 ± 1112.505
2025-05-13 12:44:58,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4268.7393, 2774.985, 4021.913, 4486.833, 1423.9285, 4111.2114, 2084.653, 2863.6467, 4105.1323, 1597.4884]
2025-05-13 12:44:58,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:44:58,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 21 minutes, 12 seconds)
2025-05-13 12:48:53,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:49:09,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2142.46411 ± 1115.713
2025-05-13 12:49:09,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1233.0355, 3360.716, 2277.036, 1224.1992, 3023.2468, 1811.3999, 1152.7881, 1527.5411, 4619.4424, 1195.237]
2025-05-13 12:49:09,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:49:09,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 17 minutes)
2025-05-13 12:53:05,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:53:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3584.20557 ± 784.376
2025-05-13 12:53:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4234.325, 4202.92, 3759.9255, 4331.495, 3490.6404, 2094.1372, 2387.433, 4559.497, 3203.254, 3578.4272]
2025-05-13 12:53:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:53:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 12 minutes, 48 seconds)
2025-05-13 12:57:16,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 12:57:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2997.35034 ± 1034.122
2025-05-13 12:57:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2640.1196, 1819.6617, 3377.402, 1774.3763, 2008.9163, 4326.69, 4306.4683, 2186.163, 4553.997, 2979.7102]
2025-05-13 12:57:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:57:32,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 8 minutes, 32 seconds)
2025-05-13 13:01:27,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:01:43,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3185.50391 ± 1030.988
2025-05-13 13:01:43,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4515.489, 3328.4744, 2302.8054, 1914.347, 1752.7303, 4418.7925, 4575.5596, 2308.3875, 3143.2598, 3595.1926]
2025-05-13 13:01:43,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:01:43,870 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 4 minutes, 21 seconds)
2025-05-13 13:05:38,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:05:55,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3276.24292 ± 704.792
2025-05-13 13:05:55,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2839.242, 2498.5244, 3233.6582, 3607.549, 2182.524, 4237.799, 3959.9585, 2686.7786, 3183.9055, 4332.4873]
2025-05-13 13:05:55,240 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:05:55,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 7 seconds)
2025-05-13 13:09:50,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:10:06,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3220.89453 ± 889.175
2025-05-13 13:10:06,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2037.8195, 2341.67, 4112.5024, 3136.8286, 2854.3562, 3827.152, 2391.2837, 4388.5347, 4595.953, 2522.845]
2025-05-13 13:10:06,393 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:10:06,409 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 55 minutes, 55 seconds)
2025-05-13 13:14:01,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:14:17,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3247.56201 ± 907.130
2025-05-13 13:14:17,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3989.2566, 4200.17, 2586.2595, 3971.9895, 2735.922, 2821.548, 2056.9426, 1780.9911, 4400.2046, 3932.3333]
2025-05-13 13:14:17,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:14:17,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 51 minutes, 44 seconds)
2025-05-13 13:18:12,497 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:18:28,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3010.61060 ± 876.873
2025-05-13 13:18:28,490 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2164.152, 3362.7227, 1890.1172, 4290.7407, 2650.5972, 1881.7035, 3472.9758, 2908.1392, 2950.4053, 4534.551]
2025-05-13 13:18:28,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:18:28,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 47 minutes, 30 seconds)
2025-05-13 13:22:23,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:22:39,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3024.25415 ± 1114.977
2025-05-13 13:22:39,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4398.454, 3740.0579, 2058.2095, 1558.8971, 4345.372, 1680.8771, 4194.209, 3132.1033, 3462.2078, 1672.1542]
2025-05-13 13:22:39,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:22:39,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 43 minutes, 13 seconds)
2025-05-13 13:26:34,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:26:50,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3234.17944 ± 1132.002
2025-05-13 13:26:50,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2345.4514, 4102.919, 4037.8904, 1475.1932, 4289.0415, 1533.7026, 4395.7026, 4429.37, 2377.8643, 3354.6628]
2025-05-13 13:26:50,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:26:50,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 38 minutes, 59 seconds)
2025-05-13 13:30:45,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:31:01,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3879.00977 ± 782.860
2025-05-13 13:31:01,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3230.7627, 4462.5435, 4012.5073, 4104.8926, 4458.5254, 1762.7672, 3932.3635, 4288.193, 4195.6274, 4341.9126]
2025-05-13 13:31:01,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:31:01,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3879.01) for latency MM1Queue_a033_s075
2025-05-13 13:31:02,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 34 minutes, 51 seconds)
2025-05-13 13:34:56,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:35:12,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3663.79614 ± 896.898
2025-05-13 13:35:12,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2675.767, 4609.0337, 2359.799, 4423.316, 3531.2852, 4434.5386, 3428.4133, 2284.1953, 4217.858, 4673.7544]
2025-05-13 13:35:12,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:35:12,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 30 minutes, 35 seconds)
2025-05-13 13:39:07,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:39:23,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3287.73291 ± 998.236
2025-05-13 13:39:23,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2203.8142, 4524.071, 4232.572, 3473.1553, 3983.1675, 2891.6868, 3707.4348, 2391.1067, 4167.4624, 1302.8563]
2025-05-13 13:39:23,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:39:23,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 26 minutes, 25 seconds)
2025-05-13 13:43:18,835 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:43:35,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3402.72021 ± 1061.634
2025-05-13 13:43:35,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2800.819, 4086.753, 4173.4346, 4317.2095, 4560.786, 1430.1531, 4506.4287, 1891.1285, 2878.757, 3381.7341]
2025-05-13 13:43:35,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:43:35,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 22 minutes, 18 seconds)
2025-05-13 13:47:29,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:47:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3953.69287 ± 351.336
2025-05-13 13:47:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4385.012, 3469.4395, 4101.943, 3671.3413, 4346.4717, 3515.4736, 4281.1055, 4280.0444, 3565.0405, 3921.0603]
2025-05-13 13:47:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:47:46,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3953.69) for latency MM1Queue_a033_s075
2025-05-13 13:47:46,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 18 minutes, 7 seconds)
2025-05-13 13:51:40,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:51:57,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2955.43311 ± 1101.178
2025-05-13 13:51:57,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2558.0972, 2283.0984, 4322.6255, 2018.9294, 4253.287, 1676.257, 1480.4855, 4235.891, 4169.697, 2555.9622]
2025-05-13 13:51:57,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:51:57,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 13 minutes, 52 seconds)
2025-05-13 13:55:51,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 13:56:08,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3039.82910 ± 1251.622
2025-05-13 13:56:08,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4157.4976, 4457.8833, 4226.6147, 1236.2563, 3187.5469, 1393.898, 3060.2437, 2768.0466, 4524.334, 1385.9716]
2025-05-13 13:56:08,043 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:56:08,056 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 9 minutes, 43 seconds)
2025-05-13 14:00:03,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:00:19,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3318.43042 ± 971.534
2025-05-13 14:00:19,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3321.033, 2774.882, 4035.4517, 3283.8254, 2502.621, 4545.455, 1580.1287, 2308.493, 4466.1177, 4366.295]
2025-05-13 14:00:19,120 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:00:19,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 5 minutes, 33 seconds)
2025-05-13 14:04:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:04:30,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3628.12183 ± 473.595
2025-05-13 14:04:30,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4194.572, 4303.5166, 4074.685, 3845.8809, 3609.0671, 3125.4453, 3125.9846, 3830.3599, 2879.8655, 3291.8413]
2025-05-13 14:04:30,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:04:30,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 1 minute, 19 seconds)
2025-05-13 14:08:25,103 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:08:41,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3583.12451 ± 1109.882
2025-05-13 14:08:41,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1747.2981, 1521.6664, 2743.5535, 4494.857, 4370.8574, 3534.945, 4209.865, 4456.7627, 4123.6743, 4627.7686]
2025-05-13 14:08:41,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:08:41,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 57 minutes, 8 seconds)
2025-05-13 14:12:36,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:12:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3529.34131 ± 924.013
2025-05-13 14:12:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2683.263, 3688.6594, 4369.8057, 4104.562, 4220.386, 4577.398, 2966.8965, 1561.1644, 4264.744, 2856.5344]
2025-05-13 14:12:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:12:52,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 52 minutes, 56 seconds)
2025-05-13 14:16:47,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:17:03,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3199.91260 ± 1008.154
2025-05-13 14:17:03,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2799.9702, 4552.7803, 4534.821, 2580.976, 3739.082, 1647.2025, 4420.0957, 2331.3914, 2168.8381, 3223.9678]
2025-05-13 14:17:03,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:17:03,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 48 minutes, 47 seconds)
2025-05-13 14:20:58,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:21:14,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3186.56299 ± 919.560
2025-05-13 14:21:14,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2501.148, 3699.1, 4382.6123, 1589.119, 3408.335, 3559.7336, 3931.7112, 3176.5085, 3973.086, 1644.2766]
2025-05-13 14:21:14,327 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:21:14,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 44 minutes, 36 seconds)
2025-05-13 14:25:09,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:25:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2933.60767 ± 1036.578
2025-05-13 14:25:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4309.897, 2081.4148, 3100.2922, 1913.6816, 2886.4043, 2572.4956, 1295.2882, 4378.6426, 2472.0195, 4325.941]
2025-05-13 14:25:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:25:25,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 40 minutes, 26 seconds)
2025-05-13 14:29:20,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:29:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3765.11646 ± 992.526
2025-05-13 14:29:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4376.4683, 1687.6317, 4625.4497, 2222.351, 4413.743, 4117.997, 3997.2153, 4449.18, 4568.81, 3192.3174]
2025-05-13 14:29:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:29:36,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 36 minutes, 16 seconds)
2025-05-13 14:33:31,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:33:47,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3409.18018 ± 1003.834
2025-05-13 14:33:47,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4523.386, 1761.6401, 1328.2231, 3828.7222, 4133.6313, 3793.5767, 3653.9333, 3610.3855, 4293.4404, 3164.859]
2025-05-13 14:33:47,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:33:47,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 32 minutes, 4 seconds)
2025-05-13 14:37:42,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:37:58,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2677.33740 ± 1335.603
2025-05-13 14:37:58,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1227.6677, 1574.4451, 2444.5796, 4581.783, 3918.8633, 1391.5717, 4349.6904, 4190.362, 1593.9318, 1500.4779]
2025-05-13 14:37:58,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:37:58,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 27 minutes, 50 seconds)
2025-05-13 14:41:53,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:42:09,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3470.25781 ± 993.354
2025-05-13 14:42:09,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4297.646, 1663.6202, 4565.044, 2522.088, 2621.499, 4019.6353, 2440.2925, 4171.6, 3900.7534, 4500.3975]
2025-05-13 14:42:09,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:42:09,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 23 minutes, 39 seconds)
2025-05-13 14:46:03,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:46:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3294.12549 ± 1153.152
2025-05-13 14:46:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2051.465, 2001.993, 4211.859, 2223.0876, 3637.5308, 4330.981, 4328.363, 4350.2876, 4387.783, 1417.9033]
2025-05-13 14:46:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:46:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 19 minutes, 26 seconds)
2025-05-13 14:50:15,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:50:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3299.72412 ± 1114.004
2025-05-13 14:50:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2627.0493, 4494.7197, 4229.3037, 1963.6033, 4598.7896, 4529.451, 3975.9727, 1808.1893, 2788.3696, 1981.7954]
2025-05-13 14:50:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:50:31,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 15 minutes, 18 seconds)
2025-05-13 14:54:27,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:54:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3699.57080 ± 891.478
2025-05-13 14:54:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4406.3477, 2278.913, 4121.831, 4327.3486, 4490.5117, 4475.207, 4173.2407, 2068.0784, 2844.2324, 3809.996]
2025-05-13 14:54:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:54:43,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 11 minutes, 9 seconds)
2025-05-13 14:58:39,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 14:58:55,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3207.71924 ± 1137.479
2025-05-13 14:58:55,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1581.0911, 3473.0723, 4583.7314, 3563.9077, 1922.4672, 2889.6448, 3547.053, 1497.7471, 4613.777, 4404.7017]
2025-05-13 14:58:55,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:58:55,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 7 minutes, 2 seconds)
2025-05-13 15:02:51,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:03:07,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3276.63379 ± 1173.862
2025-05-13 15:03:07,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4403.9995, 4090.8276, 1527.3163, 1945.3119, 4549.504, 3116.8557, 1628.1294, 4544.411, 4172.764, 2787.219]
2025-05-13 15:03:07,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:03:07,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 2 minutes, 55 seconds)
2025-05-13 15:07:03,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:07:19,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2594.86963 ± 988.878
2025-05-13 15:07:19,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3922.5996, 1366.9264, 1879.4884, 3611.918, 3084.8254, 2535.4841, 1276.8492, 4023.4543, 1625.5442, 2621.6045]
2025-05-13 15:07:19,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:07:19,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 58 minutes, 46 seconds)
2025-05-13 15:11:15,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:11:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3749.62500 ± 1121.503
2025-05-13 15:11:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1463.4757, 4413.38, 4321.727, 1635.0829, 4153.9614, 4540.9707, 4573.253, 3769.3784, 4382.8994, 4242.122]
2025-05-13 15:11:31,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:11:31,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 54 minutes, 35 seconds)
2025-05-13 15:15:27,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:15:43,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3459.83350 ± 1217.909
2025-05-13 15:15:43,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4218.6816, 3246.033, 4614.4395, 4613.6426, 4432.241, 4051.2725, 4298.0254, 1612.744, 1350.1992, 2161.055]
2025-05-13 15:15:43,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:15:43,982 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 50 minutes, 25 seconds)
2025-05-13 15:19:40,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:19:56,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3139.09790 ± 852.223
2025-05-13 15:19:56,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3986.2131, 2533.8801, 2310.4077, 2804.0447, 4521.5537, 3918.8364, 2180.134, 4136.336, 2753.9949, 2245.5776]
2025-05-13 15:19:56,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:19:56,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 46 minutes, 15 seconds)
2025-05-13 15:23:52,534 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:24:08,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2790.45166 ± 1159.099
2025-05-13 15:24:08,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3554.7903, 4218.58, 4518.466, 2130.1902, 2022.4445, 1350.0781, 2864.6084, 1580.8123, 1557.0067, 4107.543]
2025-05-13 15:24:08,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:24:08,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 42 minutes, 1 second)
2025-05-13 15:28:03,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:28:19,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2599.67993 ± 1056.398
2025-05-13 15:28:19,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4564.7583, 1945.3912, 2329.799, 2349.515, 3400.9324, 2360.0789, 1421.4845, 1556.4136, 1773.0417, 4295.3867]
2025-05-13 15:28:19,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:28:19,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 37 minutes, 48 seconds)
2025-05-13 15:32:14,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:32:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3183.18140 ± 1034.465
2025-05-13 15:32:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1352.9584, 3669.1558, 3409.6477, 2412.1042, 4159.837, 2606.7056, 3714.2764, 4261.3657, 4488.7974, 1756.9675]
2025-05-13 15:32:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:32:30,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 33 minutes, 34 seconds)
2025-05-13 15:36:25,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:36:41,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3199.26440 ± 995.757
2025-05-13 15:36:41,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4176.9033, 4469.213, 4286.08, 1931.8641, 2852.4583, 2330.817, 2909.5996, 1691.1287, 4335.2144, 3009.3652]
2025-05-13 15:36:41,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:36:41,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 29 minutes, 20 seconds)
2025-05-13 15:40:35,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:40:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3417.63281 ± 902.753
2025-05-13 15:40:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4412.325, 4376.72, 4476.1787, 2931.9885, 2428.0098, 2975.0554, 4419.1875, 2447.2864, 2102.96, 3606.6182]
2025-05-13 15:40:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:40:51,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 25 minutes, 5 seconds)
2025-05-13 15:44:45,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:45:01,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3024.03369 ± 1045.641
2025-05-13 15:45:01,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1268.4065, 2967.3735, 2878.582, 3166.1628, 4632.304, 4350.4277, 1560.3527, 3881.968, 2292.5972, 3242.1619]
2025-05-13 15:45:01,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:45:01,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 53 seconds)
2025-05-13 15:48:55,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:49:11,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3629.42920 ± 847.390
2025-05-13 15:49:11,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3031.2913, 3726.5977, 4645.0547, 3227.6526, 3956.1812, 4317.715, 4472.2476, 3372.7734, 1590.2693, 3954.5103]
2025-05-13 15:49:11,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:49:11,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 41 seconds)
2025-05-13 15:53:05,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:53:22,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3103.89404 ± 1106.616
2025-05-13 15:53:22,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4039.4116, 4475.674, 4041.5564, 1708.3855, 2907.4158, 2762.6306, 4007.1736, 1362.1265, 3995.4805, 1739.0851]
2025-05-13 15:53:22,230 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:53:22,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 30 seconds)
2025-05-13 15:57:16,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 15:57:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3727.51636 ± 658.619
2025-05-13 15:57:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4035.7632, 4374.7144, 3534.987, 4237.857, 4079.0164, 3047.8167, 4559.667, 2352.036, 3897.958, 3155.3481]
2025-05-13 15:57:32,682 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:57:32,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 20 seconds)
2025-05-13 16:01:27,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:01:43,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3521.68555 ± 1093.935
2025-05-13 16:01:43,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1571.1653, 3860.4365, 4301.3984, 1603.3718, 3864.9224, 4216.0664, 2642.121, 4681.8916, 4323.2334, 4152.247]
2025-05-13 16:01:43,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:01:43,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 10 seconds)
2025-05-13 16:05:38,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-13 16:05:54,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3141.80859 ± 1213.369
2025-05-13 16:05:54,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1882.4476, 1791.8486, 4224.8027, 1572.5092, 3930.126, 4069.2888, 3907.9026, 1436.4075, 4222.2163, 4380.5366]
2025-05-13 16:05:54,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:05:54,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1251 [DEBUG]: Training session finished
