MegaHan97K: A large-scale dataset for mega-category Chinese character recognition with over 97K categories

Published: 01 Jan 2025, Last Modified: 22 Jul 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•MegaHan97K contains 97,455 Chinese character categories, six times more than existing datasets.•MegaHan97K supports the GB18030-2022 standard for comprehensive Chinese character coverage.•MegaHan97K comprises handwritten, historical, and synthetic subsets.•MegaHan97K provides balanced samples across categories.
Loading