# EDINET-Bench

This code can be used to evaluate LLMs on EDINET-Bench, a Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction.
This dataset is built leveraging [EDINET](https://disclosure2.edinet-fsa.go.jp), a platform managed by the Financial Services Agency (FSA) of Japan that provides access to disclosure documents such as securities reports.


## Install
Install the dependencies using uv.
```
uv sync
```

You also need to configure the API keys for each LLM provider in the .env file.

## Evaluation

### Accounting Fraud Detection and Earnings Forecast

Use Claude 3.5 Sonnet to predict whether a report is fraudulent based on the Balance Sheet (BS), Cash Flow (CF), Profit and Loss (PL), and summary items from annual reports.
```bash
$ python src/edinet_bench/predict.py --task fraud_detection --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary
```


Use logistic model as a baseline.
```bash
$ python src/edinet_bench/logistic.py --task earnings_forecast
```

Create a leaderboard for each model.
```bash
$ python src/edinet_bench/make_leaderboard.py --task fraud_detection
```

### Industry Prediction
Predict a company's industry type (e.g., Banking) based on its current annual report.
```bash
$ python src/edinet_bench/industry_prediction/predict.py --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary
```

Create a leaderboard for each model.
```
$ python src/edinet_bench/industry_prediction/make_leaderboard.py 
```

