2025-09-11 18:24:59,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:24:59,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:24:59,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14f279ed7110>}
2025-09-11 18:24:59,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 18:25:00,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 18:25:00,032 baseline-mbpac-noiseperc10-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 18:25:00,032 baseline-mbpac-noiseperc10-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:25:00,040 baseline-mbpac-noiseperc10-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:25:01,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 18:25:01,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 18:35:34,756 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:35:34,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:40:08,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -337.13336 ± 34.131
2025-09-11 18:40:08,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-299.74716, -381.10843, -343.7838, -366.21017, -290.38208, -278.59863, -360.5888, -362.15726, -328.02682, -360.73038]
2025-09-11 18:40:08,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:40:08,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (-337.13) for latency ExtremeClogL1U23
2025-09-11 18:40:08,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 24 hours, 56 minutes, 31 seconds)
2025-09-11 18:51:32,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:51:32,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:55:58,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -184.70851 ± 51.137
2025-09-11 18:55:58,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-130.15851, -216.93387, -314.60803, -150.08052, -174.93774, -140.4339, -145.18166, -203.03047, -193.14104, -178.57945]
2025-09-11 18:55:58,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:55:58,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (-184.71) for latency ExtremeClogL1U23
2025-09-11 18:55:58,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 25 hours, 16 minutes, 13 seconds)
2025-09-11 19:07:24,652 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:07:24,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:11:51,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 117.87223 ± 95.241
2025-09-11 19:11:51,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [173.36102, 32.405293, 141.77849, 40.461205, 170.36473, 188.75523, 258.83026, 52.85051, 192.57802, -72.66253]
2025-09-11 19:11:51,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:11:51,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (117.87) for latency ExtremeClogL1U23
2025-09-11 19:11:51,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 13 minutes, 56 seconds)
2025-09-11 19:23:11,566 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:23:11,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:27:37,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 987.96008 ± 341.887
2025-09-11 19:27:37,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [316.61743, 876.5481, 1446.7794, 944.2767, 1228.6539, 455.87018, 1020.2998, 1251.0635, 1258.9578, 1080.5333]
2025-09-11 19:27:37,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:27:37,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (987.96) for latency ExtremeClogL1U23
2025-09-11 19:27:37,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 2 minutes, 15 seconds)
2025-09-11 19:38:59,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:38:59,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:43:33,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1386.01831 ± 610.217
2025-09-11 19:43:33,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1780.7493, 340.42728, 852.33777, 1785.8849, 2032.2648, 347.33786, 1780.549, 1636.9663, 1357.3604, 1946.3052]
2025-09-11 19:43:33,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:43:33,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (1386.02) for latency ExtremeClogL1U23
2025-09-11 19:43:33,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 24 hours, 51 minutes, 56 seconds)
2025-09-11 19:54:58,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:54:58,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:59:25,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1360.93701 ± 1025.292
2025-09-11 19:59:25,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [220.93295, 2164.913, 2434.2588, 1779.1364, 212.79759, 227.23065, 1408.8203, 2481.0266, 40.497257, 2639.7568]
2025-09-11 19:59:25,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:59:25,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 24 hours, 50 minutes, 24 seconds)
2025-09-11 20:10:49,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:10:49,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:15:17,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1881.64331 ± 682.240
2025-09-11 20:15:17,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2241.054, 217.45952, 2445.348, 1548.2338, 2429.4956, 2236.7048, 1627.5018, 2381.5225, 2384.6873, 1304.4282]
2025-09-11 20:15:17,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:15:17,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (1881.64) for latency ExtremeClogL1U23
2025-09-11 20:15:17,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 24 hours, 35 minutes, 22 seconds)
2025-09-11 20:26:36,671 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:26:36,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:31:07,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2882.05933 ± 126.160
2025-09-11 20:31:07,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2946.1553, 2832.988, 2995.5227, 2753.7844, 2959.2842, 3008.8496, 2593.9453, 2974.5366, 2808.1865, 2947.3394]
2025-09-11 20:31:07,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:31:07,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (2882.06) for latency ExtremeClogL1U23
2025-09-11 20:31:07,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 18 minutes, 42 seconds)
2025-09-11 20:42:32,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:42:32,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:47:04,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2244.40186 ± 1068.534
2025-09-11 20:47:04,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [631.9256, 2725.9316, 2962.5535, 3360.217, 3188.6316, 2227.0618, 201.20035, 2201.2346, 3381.5671, 1563.6973]
2025-09-11 20:47:04,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:47:04,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 5 minutes, 59 seconds)
2025-09-11 20:58:30,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:58:30,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:02:58,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2701.56763 ± 1016.030
2025-09-11 21:02:58,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3545.9136, 3020.1172, 831.115, 3113.3564, 3200.17, 2048.9766, 3491.9475, 3674.8906, 876.08044, 3213.109]
2025-09-11 21:02:58,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:02:58,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 23 hours, 49 minutes, 38 seconds)
2025-09-11 21:14:21,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:14:21,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:18:47,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3054.14380 ± 698.271
2025-09-11 21:18:47,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3113.7422, 3143.832, 2310.5144, 3382.4233, 3606.983, 3319.952, 1264.7081, 3332.8289, 3743.4678, 3322.9836]
2025-09-11 21:18:47,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:18:47,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3054.14) for latency ExtremeClogL1U23
2025-09-11 21:18:47,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 32 minutes, 42 seconds)
2025-09-11 21:30:17,943 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:30:17,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:34:49,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2415.46729 ± 1316.123
2025-09-11 21:34:49,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [606.0384, 3565.8306, 3924.611, 376.7902, 3624.229, 1254.121, 1517.0399, 3756.5645, 3326.5054, 2202.942]
2025-09-11 21:34:49,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:34:49,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 19 minutes, 35 seconds)
2025-09-11 21:46:22,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:46:22,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:50:52,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2542.71655 ± 1252.197
2025-09-11 21:50:52,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1501.1989, 1709.3359, 3125.7183, 3917.2378, 3661.3237, 1954.935, 3864.2302, 1746.5756, 96.33715, 3850.2737]
2025-09-11 21:50:52,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:50:52,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 7 minutes, 27 seconds)
2025-09-11 22:02:25,642 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:02:25,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:06:58,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3583.19385 ± 709.506
2025-09-11 22:06:58,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4192.946, 4111.978, 3885.1145, 3908.3623, 3760.2268, 3666.006, 4021.338, 2064.0227, 2335.8413, 3886.102]
2025-09-11 22:06:58,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:06:58,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3583.19) for latency ExtremeClogL1U23
2025-09-11 22:06:58,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 54 minutes, 14 seconds)
2025-09-11 22:18:37,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:18:37,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:23:11,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3773.25317 ± 378.025
2025-09-11 22:23:11,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3099.3535, 4147.3774, 3660.0884, 4015.4316, 4157.237, 3947.4236, 3845.517, 4155.419, 3552.4429, 3152.2437]
2025-09-11 22:23:11,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:23:11,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3773.25) for latency ExtremeClogL1U23
2025-09-11 22:23:11,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 43 minutes, 32 seconds)
2025-09-11 22:34:46,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:34:46,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:39:19,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2492.73779 ± 1748.805
2025-09-11 22:39:19,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3900.0645, 4421.0596, 465.04712, 14.875909, 588.28345, 3517.5361, 4384.2837, 4398.7046, 2398.9512, 838.5707]
2025-09-11 22:39:19,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:39:19,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 33 minutes, 5 seconds)
2025-09-11 22:50:58,320 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:50:58,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:55:31,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3309.78516 ± 1394.692
2025-09-11 22:55:31,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3035.0837, 1112.7281, 4380.223, 3589.847, 282.10098, 3421.276, 4536.371, 4184.7666, 4306.3203, 4249.1343]
2025-09-11 22:55:31,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:55:31,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 19 minutes, 41 seconds)
2025-09-11 23:07:11,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:07:11,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:11:44,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2679.31885 ± 1756.138
2025-09-11 23:11:44,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4235.629, 2483.5964, 4552.9077, 4036.3699, 1254.8792, 216.57307, 4520.78, 505.37296, 4330.3433, 656.7368]
2025-09-11 23:11:44,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:11:44,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 6 minutes, 16 seconds)
2025-09-11 23:23:20,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:23:20,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:27:57,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2826.16382 ± 1629.623
2025-09-11 23:27:57,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2212.9084, 593.2863, 2174.0588, 132.44641, 3091.5786, 1749.4762, 4506.1553, 4621.702, 4502.872, 4677.1543]
2025-09-11 23:27:57,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:27:57,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 51 minutes, 58 seconds)
2025-09-11 23:39:30,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:39:30,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:43:57,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2667.20923 ± 1664.388
2025-09-11 23:43:57,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4596.2446, 3899.1543, 3997.051, 386.32486, 4559.01, 849.7816, 854.5928, 2075.8516, 4303.064, 1151.0155]
2025-09-11 23:43:57,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:43:57,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 32 minutes, 24 seconds)
2025-09-11 23:55:39,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:55:39,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:00:13,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3615.89722 ± 1201.801
2025-09-12 00:00:13,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4854.4946, 4718.0825, 4570.4766, 4089.8347, 4360.9897, 2430.2678, 3401.166, 1692.756, 1581.0223, 4459.884]
2025-09-12 00:00:13,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:00:13,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 18 minutes, 1 second)
2025-09-12 00:11:41,042 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:11:41,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:16:13,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3861.50781 ± 1115.324
2025-09-12 00:16:13,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4186.303, 4687.3125, 4172.171, 4523.9824, 4562.6987, 3643.4978, 4258.996, 980.5245, 4827.3555, 2772.2324]
2025-09-12 00:16:13,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:16:13,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (3861.51) for latency ExtremeClogL1U23
2025-09-12 00:16:13,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 58 minutes, 56 seconds)
2025-09-12 00:27:41,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:27:41,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:32:06,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3141.41162 ± 1576.689
2025-09-12 00:32:06,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [950.89923, 4439.3667, 4878.5093, 2585.606, 4999.407, 1863.5712, 4293.684, 1319.893, 1413.797, 4669.3833]
2025-09-12 00:32:06,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:32:06,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 37 minutes, 33 seconds)
2025-09-12 00:43:29,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:43:29,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:48:01,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3769.74023 ± 1628.449
2025-09-12 00:48:01,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4551.6245, 4823.224, 4884.6904, 2225.9158, 572.4157, 4464.828, 5132.537, 4918.7065, 1289.0402, 4834.4175]
2025-09-12 00:48:01,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:48:01,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 17 minutes, 7 seconds)
2025-09-12 00:59:26,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:59:26,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:03:59,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2932.39990 ± 1749.485
2025-09-12 01:03:59,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4975.382, 1105.7406, 1400.2147, 1699.8547, 4598.3604, 4747.2227, 1330.2379, 4254.333, 528.62616, 4684.025]
2025-09-12 01:03:59,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:03:59,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 18 seconds)
2025-09-12 01:15:22,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:15:22,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:19:53,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2967.28784 ± 1517.820
2025-09-12 01:19:53,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4734.765, 5083.114, 2834.4746, 3526.0764, 4861.6177, 3039.5852, 1763.0251, 1894.01, 290.6689, 1645.5438]
2025-09-12 01:19:53,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:19:53,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 39 minutes, 9 seconds)
2025-09-12 01:31:19,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:31:19,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:35:56,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3299.93823 ± 1533.897
2025-09-12 01:35:56,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2141.3503, 5248.0728, 1911.4705, 3137.9053, 675.58136, 4782.294, 4076.3633, 5145.3955, 1698.4681, 4182.4814]
2025-09-12 01:35:56,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:35:56,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 23 minutes, 48 seconds)
2025-09-12 01:47:24,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:47:24,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:51:56,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3748.12256 ± 1199.266
2025-09-12 01:51:56,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4349.948, 2757.5266, 5074.984, 4941.9185, 4240.4863, 5020.8013, 1487.3771, 3472.454, 2098.6877, 4037.0444]
2025-09-12 01:51:56,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:51:56,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 9 minutes, 38 seconds)
2025-09-12 02:03:19,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:03:19,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:07:44,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3031.32910 ± 1183.705
2025-09-12 02:07:44,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1174.6917, 4138.5474, 2709.5957, 2972.6836, 4079.902, 1671.2078, 5027.41, 4026.9753, 2027.3802, 2484.895]
2025-09-12 02:07:44,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:07:44,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 51 minutes, 58 seconds)
2025-09-12 02:19:02,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:19:02,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:23:28,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3159.14941 ± 1625.961
2025-09-12 02:23:28,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4715.638, 2204.932, 2013.353, 4628.2207, 1178.6969, 5327.549, 4999.0015, 3793.6558, 1991.774, 738.6737]
2025-09-12 02:23:28,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:23:28,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 32 minutes, 51 seconds)
2025-09-12 02:34:40,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:34:40,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:39:03,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4134.61035 ± 1183.661
2025-09-12 02:39:03,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3351.816, 5026.888, 5002.6914, 5161.6245, 1344.2891, 5257.8184, 3055.7188, 4491.088, 4763.3276, 3890.8438]
2025-09-12 02:39:03,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:39:03,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4134.61) for latency ExtremeClogL1U23
2025-09-12 02:39:03,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 12 minutes, 30 seconds)
2025-09-12 02:50:14,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:50:14,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:54:36,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3915.40234 ± 1464.154
2025-09-12 02:54:36,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3272.8171, 4605.249, 5030.5933, 5094.456, 4729.8345, 4553.802, 4159.0884, 2737.7573, 4862.9097, 107.51643]
2025-09-12 02:54:36,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:54:36,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 17 hours, 49 minutes, 59 seconds)
2025-09-12 03:06:38,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:06:38,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:11:30,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4779.00000 ± 1040.552
2025-09-12 03:11:30,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5046.381, 4993.453, 4708.6353, 5229.644, 5332.502, 1707.531, 5310.051, 5195.396, 5299.5693, 4966.8374]
2025-09-12 03:11:30,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:11:30,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (4779.00) for latency ExtremeClogL1U23
2025-09-12 03:11:30,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 17 hours, 46 minutes, 15 seconds)
2025-09-12 03:23:44,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:23:44,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:28:32,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2690.22900 ± 1776.934
2025-09-12 03:28:32,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3116.431, 4857.6313, 1818.1664, 380.00568, 794.3192, 1373.1868, 5128.043, 4901.735, 3693.9734, 838.8013]
2025-09-12 03:28:32,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:28:32,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 46 minutes, 22 seconds)
2025-09-12 03:40:46,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:40:46,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:45:35,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2793.70508 ± 1823.258
2025-09-12 03:45:35,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4276.449, 2746.752, 1113.0936, 4794.7246, 129.45267, 1276.8738, 549.28674, 4858.541, 5125.9424, 3065.933]
2025-09-12 03:45:35,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:45:35,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 47 minutes, 34 seconds)
2025-09-12 03:57:51,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:57:51,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:02:41,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3372.88672 ± 1869.583
2025-09-12 04:02:41,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1324.7734, 4117.248, 762.07715, 580.9597, 5105.9453, 5323.8013, 5188.0083, 5255.5435, 3961.7437, 2108.7656]
2025-09-12 04:02:41,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:02:41,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 50 minutes, 27 seconds)
2025-09-12 04:14:58,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:14:58,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:19:49,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3629.40576 ± 1947.744
2025-09-12 04:19:49,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2301.6765, 5312.2207, 5284.269, 4901.96, 404.2949, 941.3884, 5516.6294, 5039.043, 4967.6616, 1624.9185]
2025-09-12 04:19:49,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:19:49,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 53 minutes, 36 seconds)
2025-09-12 04:32:06,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:32:06,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:36:58,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3394.48389 ± 1881.731
2025-09-12 04:36:58,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4992.964, 5168.6963, 5089.6914, 5433.454, 644.1283, 1743.4652, 3857.24, 4750.718, 1351.0248, 913.46063]
2025-09-12 04:36:58,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:36:58,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 39 minutes, 42 seconds)
2025-09-12 04:49:16,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:49:16,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:54:05,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3843.20947 ± 1714.892
2025-09-12 04:54:05,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1016.6626, 5485.359, 5338.829, 5270.995, 5412.326, 2206.6797, 2070.1382, 5222.198, 4516.954, 1891.9545]
2025-09-12 04:54:05,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:54:05,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 17 hours, 23 minutes, 45 seconds)
2025-09-12 05:06:24,726 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:06:24,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:11:17,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4635.99902 ± 1053.182
2025-09-12 05:11:17,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4047.5833, 5286.438, 3954.7473, 2834.8123, 5497.9136, 4502.9844, 5621.79, 3189.0261, 5227.015, 6197.6763]
2025-09-12 05:11:17,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:11:17,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 17 hours, 8 minutes, 22 seconds)
2025-09-12 05:23:39,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:23:39,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:28:30,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3998.35864 ± 1933.925
2025-09-12 05:28:30,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2443.2969, 3874.0134, 6107.9214, 5579.5415, 5171.7573, 5714.6377, 1631.959, 287.03046, 3239.8728, 5933.56]
2025-09-12 05:28:30,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:28:30,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 52 minutes, 41 seconds)
2025-09-12 05:40:51,728 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:40:51,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:45:42,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2996.75171 ± 1886.469
2025-09-12 05:45:42,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2395.7212, 269.45483, 5586.023, 5557.4863, 5599.1543, 1151.8306, 3447.7268, 2137.1487, 2636.8892, 1186.0812]
2025-09-12 05:45:42,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:45:42,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 16 hours, 36 minutes, 22 seconds)
2025-09-12 05:58:04,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:58:04,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:02:56,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3417.24219 ± 2093.122
2025-09-12 06:02:56,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [415.92642, 806.23145, 5680.5, 5346.5845, 5320.762, 1062.4595, 5023.306, 1703.8606, 5524.7144, 3288.075]
2025-09-12 06:02:56,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:02:56,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 16 hours, 20 minutes, 2 seconds)
2025-09-12 06:15:18,210 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:15:18,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:20:09,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3078.95166 ± 2280.549
2025-09-12 06:20:09,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5514.747, 109.639786, 5702.8228, 2961.3708, 2328.6768, 5640.4204, 5365.2847, 74.48838, 82.68747, 3009.3804]
2025-09-12 06:20:09,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:20:09,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 16 hours, 4 minutes)
2025-09-12 06:32:32,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:32:32,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:37:23,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4561.53467 ± 1506.535
2025-09-12 06:37:23,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5385.874, 5080.395, 5528.4897, 2672.0974, 5010.441, 4779.9785, 5970.648, 6312.8496, 3644.7593, 1229.8181]
2025-09-12 06:37:23,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:37:23,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 46 minutes, 58 seconds)
2025-09-12 06:49:43,842 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:49:43,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:54:36,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3295.03467 ± 1970.321
2025-09-12 06:54:36,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5120.4126, 70.71809, 439.00067, 2453.9097, 2096.4246, 5049.2183, 2239.7854, 5435.667, 4863.9736, 5181.2383]
2025-09-12 06:54:36,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:54:36,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 15 hours, 29 minutes, 53 seconds)
2025-09-12 07:06:59,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:06:59,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:11:49,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4076.57568 ± 2083.195
2025-09-12 07:11:49,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6152.67, 5641.799, 5551.5864, 5183.465, 5823.3423, 336.03635, 1574.0074, 5661.5513, 1437.2783, 3404.0227]
2025-09-12 07:11:49,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:11:49,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 15 hours, 12 minutes, 49 seconds)
2025-09-12 07:24:10,448 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:24:10,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:29:01,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4035.66064 ± 1887.594
2025-09-12 07:29:01,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5363.0776, 5870.79, 666.68414, 5371.1943, 3559.646, 2936.551, 788.44183, 5378.118, 5852.2983, 4569.8037]
2025-09-12 07:29:01,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:29:01,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 55 minutes, 13 seconds)
2025-09-12 07:41:24,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:41:24,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:46:14,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2850.23486 ± 2351.960
2025-09-12 07:46:14,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5396.8496, 357.00226, 572.4768, 5220.629, 6381.951, 495.4153, 1577.6351, 2187.4836, 753.92804, 5558.9795]
2025-09-12 07:46:14,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:46:14,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 38 minutes, 4 seconds)
2025-09-12 07:58:37,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:58:37,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:03:28,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3366.78979 ± 1976.703
2025-09-12 08:03:28,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1335.6444, 1459.6433, 910.39264, 5987.053, 5723.5737, 5796.668, 2738.9224, 5001.132, 1395.7919, 3319.0752]
2025-09-12 08:03:28,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:03:28,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 20 minutes, 50 seconds)
2025-09-12 08:15:51,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:51,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:20:41,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3482.67651 ± 2080.625
2025-09-12 08:20:41,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1446.3303, 1260.9941, 2773.898, 6155.1436, 5655.3223, 2513.0403, 6183.597, 5568.911, 2736.009, 533.52057]
2025-09-12 08:20:41,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:20:41,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 14 hours, 3 minutes, 38 seconds)
2025-09-12 08:33:05,010 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:33:05,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:37:55,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4258.54980 ± 1687.156
2025-09-12 08:37:55,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2897.9644, 5922.0483, 3238.5427, 5036.2354, 5574.4087, 6234.602, 835.38605, 6066.277, 3202.3362, 3577.7012]
2025-09-12 08:37:55,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:37:55,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 46 minutes, 32 seconds)
2025-09-12 08:50:18,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:50:18,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:55:08,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4731.52588 ± 1583.099
2025-09-12 08:55:08,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1580.041, 5010.8574, 5239.728, 6004.0547, 5413.277, 5687.0, 5393.2925, 5641.142, 1636.9738, 5708.894]
2025-09-12 08:55:08,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:55:08,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 29 minutes, 34 seconds)
2025-09-12 09:07:30,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:07:30,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:12:22,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2831.00439 ± 2407.724
2025-09-12 09:12:22,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5443.29, 6441.021, 929.45197, -19.691437, 4366.9634, 6146.0024, 2695.5173, 444.49716, 714.32074, 1148.6738]
2025-09-12 09:12:22,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:12:22,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 13 hours, 12 minutes, 24 seconds)
2025-09-12 09:24:45,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:24:45,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:29:37,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3687.02002 ± 2452.978
2025-09-12 09:29:37,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1499.2728, 6157.9077, 6255.7236, 5267.207, 1284.4722, 5659.666, 5486.7695, 216.90388, 19.008978, 5023.2676]
2025-09-12 09:29:37,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:29:37,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 55 minutes, 25 seconds)
2025-09-12 09:42:00,732 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:42:00,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:46:52,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4163.33105 ± 1952.625
2025-09-12 09:46:52,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5111.9414, 6218.7466, 4930.9272, 1780.693, 4383.2886, 6230.4614, 3546.7512, 6227.6167, 3125.7847, 77.09762]
2025-09-12 09:46:52,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:46:52,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 38 minutes, 21 seconds)
2025-09-12 09:59:17,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:59:17,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:04:09,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3687.18872 ± 1878.743
2025-09-12 10:04:09,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4672.812, 5459.348, 839.89526, 5966.254, 3355.237, 5459.292, 2483.6057, 5605.7617, 1334.2136, 1695.4685]
2025-09-12 10:04:09,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:04:09,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 21 minutes, 37 seconds)
2025-09-12 10:16:32,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:16:32,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:21:25,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4618.64014 ± 1795.020
2025-09-12 10:21:25,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6457.7607, 5744.004, 3094.194, 411.9764, 6043.5806, 5937.011, 3469.2827, 3854.8997, 5539.474, 5634.2188]
2025-09-12 10:21:25,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:21:25,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 12 hours, 4 minutes, 42 seconds)
2025-09-12 10:33:47,753 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:33:47,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:38:38,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4469.15576 ± 1879.292
2025-09-12 10:38:38,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6239.6357, 4969.182, 5852.1, 5424.836, 992.2895, 4158.833, 6221.621, 2796.972, 6325.389, 1710.7006]
2025-09-12 10:38:38,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:38:38,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 47 minutes, 23 seconds)
2025-09-12 10:51:01,743 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:51:01,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:55:55,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5180.30957 ± 1766.792
2025-09-12 10:55:55,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5810.348, 5664.424, 5748.1816, 3970.0747, 6119.0386, 223.6029, 5951.2837, 6206.638, 6346.2, 5763.303]
2025-09-12 10:55:55,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:55:55,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (5180.31) for latency ExtremeClogL1U23
2025-09-12 10:55:55,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 30 minutes, 22 seconds)
2025-09-12 11:08:18,979 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:08:18,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:13:10,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4972.20703 ± 1813.974
2025-09-12 11:13:10,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2809.9329, 620.8887, 6429.4526, 6281.692, 6073.4897, 6217.459, 5866.0654, 4123.273, 5292.439, 6007.3804]
2025-09-12 11:13:10,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:13:10,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 13 minutes, 9 seconds)
2025-09-12 11:25:33,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:25:33,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:30:27,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3622.59766 ± 2037.253
2025-09-12 11:30:27,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3075.1526, 6196.2812, 2052.766, 5826.782, 649.98157, 796.0217, 4359.372, 2149.6643, 5885.869, 5234.085]
2025-09-12 11:30:27,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:30:27,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 55 minutes, 47 seconds)
2025-09-12 11:42:51,248 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:42:51,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:47:45,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3963.02271 ± 2493.423
2025-09-12 11:47:45,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [67.872, 6502.677, 3611.1, 6463.5283, 5893.588, 136.8858, 6281.7544, 5853.013, 1401.483, 3418.319]
2025-09-12 11:47:45,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:47:45,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 38 minutes, 57 seconds)
2025-09-12 12:00:09,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:00:09,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:05:02,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2818.10400 ± 2690.252
2025-09-12 12:05:02,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [78.48681, 6231.3306, 2155.188, 6626.0874, 134.72597, 836.54974, 5582.27, 157.03427, 724.56433, 5654.8022]
2025-09-12 12:05:02,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:05:02,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 22 minutes, 1 second)
2025-09-12 12:17:25,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:17:25,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:22:16,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4307.81104 ± 2173.194
2025-09-12 12:22:16,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6249.7104, 6345.4375, 5589.4243, 1184.7762, 5930.499, 2694.9824, 5537.0513, 334.4605, 2993.685, 6218.0806]
2025-09-12 12:22:16,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:22:16,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 4 minutes, 28 seconds)
2025-09-12 12:34:37,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:34:37,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:39:27,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4068.88013 ± 1820.470
2025-09-12 12:39:28,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1894.7175, 5707.741, 2054.9414, 6154.603, 4609.96, 1616.9543, 2228.29, 6392.0405, 4544.1084, 5485.447]
2025-09-12 12:39:28,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:39:28,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 46 minutes, 46 seconds)
2025-09-12 12:51:51,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:51:51,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:56:40,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3324.41992 ± 2292.214
2025-09-12 12:56:40,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5730.2256, 632.72186, 2155.1804, -28.47776, 4081.1638, 5122.409, 1925.2672, 6418.2646, 5992.1616, 1215.2803]
2025-09-12 12:56:40,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:56:40,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 29 minutes, 4 seconds)
2025-09-12 13:09:04,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:09:04,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:13:55,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4537.59277 ± 1355.745
2025-09-12 13:13:55,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4896.4746, 5860.34, 2043.648, 3412.0896, 2616.3538, 5972.683, 4233.7153, 4865.163, 6041.4736, 5433.9897]
2025-09-12 13:13:55,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:13:55,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 11 minutes, 26 seconds)
2025-09-12 13:26:18,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:26:18,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:31:09,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4844.84326 ± 2033.789
2025-09-12 13:31:09,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [771.1483, 4020.7173, 6478.866, 5148.6133, 6448.312, 5630.575, 6459.523, 1329.0753, 6320.592, 5841.0107]
2025-09-12 13:31:09,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:31:09,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 53 minutes, 59 seconds)
2025-09-12 13:43:34,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:43:34,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:48:26,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3676.74561 ± 2383.751
2025-09-12 13:48:26,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5786.167, 2629.15, 6512.3696, 6743.9395, 6498.2373, 1734.6566, 349.92996, 2552.7483, 3395.4978, 564.76434]
2025-09-12 13:48:26,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:48:26,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 36 minutes, 58 seconds)
2025-09-12 14:00:50,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:00:50,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:05:43,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4314.71973 ± 2121.771
2025-09-12 14:05:43,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1352.616, 6480.353, 6969.874, 5922.8477, 5628.4863, 1893.1472, 1702.255, 4566.7495, 6244.2515, 2386.616]
2025-09-12 14:05:43,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:05:43,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 20 minutes, 19 seconds)
2025-09-12 14:18:07,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:18:07,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:22:58,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3863.67627 ± 2680.268
2025-09-12 14:22:58,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [178.40442, 34.1143, 6565.459, 6036.093, 2765.1875, 4550.7476, 6786.128, 5884.9766, 5699.749, 135.90305]
2025-09-12 14:22:58,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:22:58,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 3 minutes, 15 seconds)
2025-09-12 14:35:22,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:35:22,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:40:13,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5838.64307 ± 724.343
2025-09-12 14:40:13,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4087.7742, 5930.7393, 5491.172, 5793.472, 5892.8843, 5799.1626, 6796.1377, 6886.39, 5731.1235, 5977.569]
2025-09-12 14:40:13,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:40:13,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1226 [INFO]: New best (5838.64) for latency ExtremeClogL1U23
2025-09-12 14:40:13,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 46 minutes, 1 second)
2025-09-12 14:52:37,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:52:37,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:57:31,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5604.80371 ± 1195.343
2025-09-12 14:57:31,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5543.991, 6055.6333, 3561.0225, 6263.9517, 6573.679, 6185.4077, 6068.377, 6562.0195, 6215.6313, 3018.322]
2025-09-12 14:57:31,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:57:31,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 29 minutes, 3 seconds)
2025-09-12 15:09:57,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:09:57,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:14:48,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4124.85449 ± 1605.875
2025-09-12 15:14:48,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6641.906, 5265.8105, 3685.465, 4003.664, 6471.143, 3929.554, 3971.3313, 2080.569, 3930.7698, 1268.3331]
2025-09-12 15:14:48,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:14:48,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 11 minutes, 51 seconds)
2025-09-12 15:27:10,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:27:10,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:32:01,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4832.60742 ± 1352.976
2025-09-12 15:32:01,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6103.216, 2717.9607, 6383.2793, 5478.7803, 5994.954, 5916.9214, 4112.6904, 4734.094, 2329.5933, 4554.5825]
2025-09-12 15:32:01,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:32:01,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 54 minutes, 13 seconds)
2025-09-12 15:44:03,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:44:03,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:48:35,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4088.16553 ± 1818.163
2025-09-12 15:48:35,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1829.8461, 2038.4924, 4754.303, 5620.924, 5752.354, 2191.561, 6631.0464, 2984.9736, 2735.4204, 6342.7393]
2025-09-12 15:48:35,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:48:35,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 33 minutes, 49 seconds)
2025-09-12 15:59:52,647 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:59:52,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:04:27,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4005.87646 ± 2210.265
2025-09-12 16:04:27,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5088.0713, 2222.5554, 1161.1256, 6389.099, 127.68243, 6084.6875, 2255.18, 5018.767, 5391.1284, 6320.4697]
2025-09-12 16:04:27,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:04:27,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 10 minutes, 38 seconds)
2025-09-12 16:16:25,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:16:25,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:21:08,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4340.00928 ± 1794.334
2025-09-12 16:21:08,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1483.1815, 5089.6196, 5858.839, 4173.704, 5837.58, 2094.5803, 1680.0173, 6004.0767, 6255.555, 4922.942]
2025-09-12 16:21:08,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:21:08,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 51 minutes, 10 seconds)
2025-09-12 16:33:05,194 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:33:05,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:37:50,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4391.26416 ± 1949.970
2025-09-12 16:37:50,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2393.8481, 1161.6107, 6268.084, 6413.222, 5063.9385, 5859.9385, 3359.4912, 6074.39, 1608.0215, 5710.094]
2025-09-12 16:37:50,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:37:50,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 32 minutes, 7 seconds)
2025-09-12 16:49:48,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:49:48,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:54:32,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4256.03223 ± 2391.588
2025-09-12 16:54:32,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5730.125, 6084.1333, 511.00156, 6166.3286, 1533.8223, 6366.5923, 4422.427, 5905.516, 5733.1255, 107.2516]
2025-09-12 16:54:32,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:54:32,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 13 minutes, 31 seconds)
2025-09-12 17:06:31,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:06:31,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:11:12,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3359.65625 ± 2607.597
2025-09-12 17:11:12,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6624.6914, 6815.483, 2959.8042, 338.95276, 2391.5576, 1756.5474, 624.10626, 6247.746, 56.34067, 5781.3364]
2025-09-12 17:11:12,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:11:12,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 57 minutes, 28 seconds)
2025-09-12 17:23:11,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:23:11,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:27:56,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3945.71826 ± 2180.050
2025-09-12 17:27:56,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6715.391, 3033.5144, 4695.4463, 2670.3005, 6027.351, 2601.0574, 143.49449, 1410.7969, 6671.7617, 5488.0703]
2025-09-12 17:27:56,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:27:56,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 43 minutes, 47 seconds)
2025-09-12 17:39:55,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:39:55,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:44:40,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3291.23486 ± 2538.385
2025-09-12 17:44:40,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [12.4225855, 332.57742, 5242.648, 1244.7772, 5852.2275, 6607.032, 481.0692, 4845.802, 2197.2104, 6096.5796]
2025-09-12 17:44:40,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:44:40,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 27 minutes, 17 seconds)
2025-09-12 17:56:39,655 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:56:39,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:01:25,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4936.30615 ± 2210.283
2025-09-12 18:01:25,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6396.521, 6545.521, 1784.602, 6133.952, 3418.823, 6801.6523, 6072.8423, 5668.297, 6401.24, 139.60898]
2025-09-12 18:01:25,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:01:25,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 10 minutes, 45 seconds)
2025-09-12 18:13:24,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:13:24,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:18:08,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4388.74170 ± 2577.003
2025-09-12 18:18:08,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [711.3565, 103.144005, 5905.1636, 5924.645, 621.2875, 5723.465, 6303.6553, 5691.7305, 6374.9155, 6528.0557]
2025-09-12 18:18:08,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:18:08,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 54 minutes, 7 seconds)
2025-09-12 18:30:08,948 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:30:08,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:34:51,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4435.27783 ± 2238.714
2025-09-12 18:34:51,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3358.2593, 1238.8132, 7142.201, 466.54654, 6719.159, 6594.356, 3641.6035, 5998.462, 5749.8345, 3443.5376]
2025-09-12 18:34:51,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:34:51,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 37 minutes, 27 seconds)
2025-09-12 18:46:51,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:46:51,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:51:35,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3086.43359 ± 2508.288
2025-09-12 18:51:35,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1260.2745, 89.337395, 1125.9279, 6159.9077, 286.1483, 2639.5073, 6563.1875, 5629.311, 5821.713, 1289.0214]
2025-09-12 18:51:35,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:51:35,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 20 minutes, 45 seconds)
2025-09-12 19:03:35,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:03:35,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:08:17,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4574.91504 ± 2571.286
2025-09-12 19:08:17,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6002.343, 3543.381, -71.318825, 6460.212, 7138.478, 3460.133, 6257.7783, 106.41234, 5975.591, 6876.139]
2025-09-12 19:08:17,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:08:17,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 3 minutes, 58 seconds)
2025-09-12 19:20:17,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:20:17,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:24:58,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4168.47705 ± 2201.875
2025-09-12 19:24:58,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [6195.656, 2120.7622, 6010.248, 1988.2366, 5546.0005, 6509.0874, 5242.6562, 1914.6334, 5911.5557, 245.93236]
2025-09-12 19:24:58,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:24:58,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 47 minutes, 6 seconds)
2025-09-12 19:36:59,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:36:59,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:41:43,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5829.48047 ± 950.702
2025-09-12 19:41:43,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5507.2715, 5988.2866, 6011.84, 4207.757, 6124.074, 6798.5225, 6815.0684, 4081.5525, 5838.465, 6921.966]
2025-09-12 19:41:43,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:41:43,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 30 minutes, 26 seconds)
2025-09-12 19:53:42,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:53:42,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:58:25,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4318.69580 ± 2126.210
2025-09-12 19:58:25,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4700.622, 905.9542, 5260.9487, 5873.442, 5475.694, 2494.0728, 334.54816, 6064.8525, 5488.532, 6588.288]
2025-09-12 19:58:25,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:58:25,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 13 minutes, 42 seconds)
2025-09-12 20:10:25,050 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:10:25,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:15:10,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5293.09863 ± 1964.393
2025-09-12 20:15:10,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5851.9346, -14.281436, 5268.443, 4311.1587, 6218.8325, 7305.6826, 6045.7925, 6594.133, 4756.733, 6592.5566]
2025-09-12 20:15:10,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:15:10,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 57 minutes, 1 second)
2025-09-12 20:27:10,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:27:10,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:31:56,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4902.14795 ± 1925.032
2025-09-12 20:31:56,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [5859.6465, 1959.0851, 1662.8842, 5728.7334, 6532.1865, 6029.922, 5794.905, 2401.388, 6372.873, 6679.857]
2025-09-12 20:31:56,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:31:56,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 40 minutes, 22 seconds)
2025-09-12 20:43:55,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:43:55,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:48:38,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5241.39307 ± 1772.306
2025-09-12 20:48:38,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4080.6477, 6298.7524, 4481.842, 727.66187, 7076.2495, 5017.327, 7010.0195, 5976.805, 5634.8804, 6109.7476]
2025-09-12 20:48:38,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:48:38,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 23 minutes, 40 seconds)
2025-09-12 21:00:39,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:00:39,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:05:24,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3951.74072 ± 2550.714
2025-09-12 21:05:24,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [58.054066, 6174.395, 1981.875, 6264.4614, 5997.6177, 2147.3433, 6921.873, 5804.265, 4211.8687, -44.345543]
2025-09-12 21:05:24,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:05:24,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 6 minutes, 56 seconds)
2025-09-12 21:17:21,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:17:21,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:22:03,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3582.83325 ± 2201.906
2025-09-12 21:22:03,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [7203.721, 1270.3014, 3680.585, 616.2576, 1293.9615, 6099.4976, 5474.486, 3134.5073, 1816.2021, 5238.8115]
2025-09-12 21:22:03,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:22:03,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 50 minutes, 11 seconds)
2025-09-12 21:34:00,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:34:00,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:38:43,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5474.31738 ± 1814.727
2025-09-12 21:38:43,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [426.05154, 6883.6367, 6840.238, 6563.629, 5814.156, 6661.1763, 5153.747, 4749.512, 5895.9595, 5755.067]
2025-09-12 21:38:43,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:38:43,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 25 seconds)
2025-09-12 21:50:43,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:50:43,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:55:28,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 5034.38086 ± 2053.781
2025-09-12 21:55:28,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [421.35776, 6623.4243, 6227.2354, 6254.788, 6704.9917, 5261.716, 1862.8744, 5919.3877, 4734.254, 6333.7812]
2025-09-12 21:55:28,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:55:28,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 42 seconds)
2025-09-12 22:07:28,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:07:28,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:12:10,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 4460.53125 ± 2320.473
2025-09-12 22:12:10,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [880.6912, 5845.505, 726.4903, 1167.1552, 5746.3247, 6134.948, 6185.977, 5834.452, 6045.2227, 6038.549]
2025-09-12 22:12:10,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:12:10,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-halfcheetah):1251 [DEBUG]: Training session finished
