Scores: Accuracy / F1 Score / BERTScore (F1).

Model Grouping Model Name Tag Extractionopen_in_new Value Extractionopen_in_new Formula Constructionopen_in_new Formula Calculationopen_in_new FinanceBenchopen_in_new Financial Mathopen_in_new
Financial Models Model - - - - - -
Base Models Llama 3.1 8Bopen_in_new 69.16
0.739
52.46
0.565
12.92
0.201
27.27
0.317
0.443 11.00
0.136
Llama 3.1 70Bopen_in_new 69.64
0.782
88.19
0.904
59.28
0.665
77.49
0.783
0.528 10.50
0.134
DeepSeek V3open_in_new 85.03
0.849
98.01
0.982
22.75
0.315
85.99
0.868
0.573 21.50
0.255
GPT-4oopen_in_new 81.60
0.864
97.01
0.974
79.76
0.820
83.59
0.857
0.564 27.00
0.296
Gemini 2.0 FLopen_in_new 80.27
0.811
98.02
0.980
61.90
0.644
53.57
0.536
0.552 19.00
0.204
Fine-tuned Models Llama 3.1 8B LoRAopen_in_new 89.13
0.886
98.49
0.986
77.61
0.876
98.68
0.990
0.511 30.00
0.332
Llama 3.1 8B QLoRAopen_in_new 86.89
0.872
97.14
0.974
89.34
0.898
92.81
0.947
0.542 26.50
0.307
Llama 3.1 8B DoRAopen_in_new 80.44
0.896
98.57
0.988
88.02
0.882
98.92
0.993
0.477 28.50
0.317
Llama 3.1 8B rsLoRAopen_in_new 85.26
0.879
99.13
0.992
89.46
0.893
98.80
0.988
0.575 34.50
0.370
Gemini 2.0 FL N/A 85.03
0.907
99.20
0.992
67.85
0.786
54.76
0.548
0.544 66.00
0.785