+-------------+--------+----------+-------------+----------+----------------+
|             | Random | Semantic | Tokens Only | Base LLM | Fine-Tuned LLM |
+-------------+--------+----------+-------------+----------+----------------+
|   Accuracy  | 50.0%  |  70.9%   |    70.9%    |  68.5%   |       -        |
|  Precision  | 40.0%  |  71.4%   |    62.5%    |    -     |       -        |
|    Recall   | 50.0%  |  45.5%   |    68.2%    |    -     |       -        |
|   F1 Score  | 44.4%  |  55.6%   |    65.2%    |    -     |       -        |
| Specificity | 50.0%  |  87.9%   |    72.7%    |    -     |       -        |
|     NPV     | 60.0%  |  70.7%   |    77.4%    |    -     |       -        |
+-------------+--------+----------+-------------+----------+----------------+