DuoSearch: A Novel Search Engine for Bulgarian Historical DocumentsDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 08 Jun 2023CoRR 2023Readers: Everyone
Abstract: Search in collections of digitised historical documents is hindered by a two-prong problem, orthographic variety and optical character recognition (OCR) mistakes. We present a new search engine for historical documents, DuoSearch, which uses ElasticSearch and machine learning methods based on deep neural networks to offer a solution to this problem. It was tested on a collection of historical newspapers in Bulgarian from the mid-19th to the mid-20th century. The system provides an interactive and intuitive interface for the end-users allowing them to enter search terms in modern Bulgarian and search across historical spellings. This is the first solution facilitating the use of digitised historical documents in Bulgarian.
0 Replies

Loading