+-------------+--------+----------+-------------+----------+----------------+
|             | Random | Semantic | Tokens Only | Base LLM | Fine-Tuned LLM |
+-------------+--------+----------+-------------+----------+----------------+
|   Accuracy  |  50%   |   63%    |     62%     |   60%    |      67%       |
|   Balanced  |  50%   |   59%    |     61%     |   55%    |      64%       |
|  Precision  |  44%   |   64%    |     57%     |   76%    |      69%       |
|    Recall   |  50%   |   34%    |     56%     |   12%    |      42%       |
|   F1 Score  |  47%   |   44%    |     56%     |   21%    |      52%       |
| Specificity |  50%   |   85%    |     67%     |   97%    |      85%       |
|     NPV     |  56%   |   62%    |     66%     |   59%    |      66%       |
+-------------+--------+----------+-------------+----------+----------------+