OmniPrint: A Configurable Printed Character SynthesizerDownload PDF

Jun 07, 2021 (edited Jul 15, 2021)NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
  • Keywords: ocr, meta-learning, synthesizer
  • TL;DR: We introduce OmniPrint, a synthetic data generator of isolated printed characters, geared toward machine learning research.
  • Abstract: We introduce OmniPrint, a synthetic data generator of isolated printed characters, geared toward machine learning research. It draws inspiration from famous datasets such as MNIST, SVHN and Omniglot, but offers the capability of generating a wide variety of printed characters from various languages, fonts and styles, with customized distortions. We include 935 fonts from 27 scripts and many types of distortions. As a proof of concept, we show various use cases, including an example of meta-learning dataset designed for the upcoming MetaDL NeurIPS 2021 competition. OmniPrint will be open-sourced after the competition.
  • Supplementary Material: zip
4 Replies

Loading