# Illuin Layout dataset

Internal PDF scrapped dataset collected by Anon.
This dataset is composed of french industrial PDFs from large companies. It contains pages from over 36000 PDFs, 
resulting in over 610000 pages of data, once processed, and 170 million words.