Scores: Accuracy / F1 Score / BERTScore (F1).
| Model Grouping | Model Name | Tag Extractionopen_in_new | Value Extractionopen_in_new | Formula Constructionopen_in_new | Formula Calculationopen_in_new | FinanceBenchopen_in_new | Financial Mathopen_in_new |
|---|---|---|---|---|---|---|---|
| Financial Models | Model | - | - | - | - | - | - |
| Base Models | Llama 3.1 8Bopen_in_new |
69.16 0.739 |
52.46 0.565 |
12.92 0.201 |
27.27 0.317 |
0.443 |
11.00 0.136 |
| Llama 3.1 70Bopen_in_new |
69.64 0.782 |
88.19 0.904 |
59.28 0.665 |
77.49 0.783 |
0.528 |
10.50 0.134 |
|
| DeepSeek V3open_in_new |
85.03 0.849 |
98.01 0.982 |
22.75 0.315 |
85.99 0.868 |
0.573 |
21.50 0.255 |
|
| GPT-4oopen_in_new |
81.60 0.864 |
97.01 0.974 |
79.76 0.820 |
83.59 0.857 |
0.564 |
27.00 0.296 |
|
| Gemini 2.0 FLopen_in_new |
80.27 0.811 |
98.02 0.980 |
61.90 0.644 |
53.57 0.536 |
0.552 |
19.00 0.204 |
|
| Fine-tuned Models | Llama 3.1 8B LoRAopen_in_new |
89.13 0.886 |
98.49 0.986 |
77.61 0.876 |
98.68 0.990 |
0.511 |
30.00 0.332 |
| Llama 3.1 8B QLoRAopen_in_new |
86.89 0.872 |
97.14 0.974 |
89.34 0.898 |
92.81 0.947 |
0.542 |
26.50 0.307 |
|
| Llama 3.1 8B DoRAopen_in_new |
80.44 0.896 |
98.57 0.988 |
88.02 0.882 |
98.92 0.993 |
0.477 |
28.50 0.317 |
|
| Llama 3.1 8B rsLoRAopen_in_new |
85.26 0.879 |
99.13 0.992 |
89.46 0.893 |
98.80 0.988 |
0.575 |
34.50 0.370 |
|
| Gemini 2.0 FL N/A |
85.03 0.907 |
99.20 0.992 |
67.85 0.786 |
54.76 0.548 |
0.544 |
66.00 0.785 |