Developing lexicographic sorting: An example for Urdu

Published: 01 Jan 2007, Last Modified: 24 Feb 2025ACM Trans. Asian Lang. Inf. Process. 2007EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Collation or lexicographic sorting is essential to develop multilingual computing. This paper presents the challenges faced in developing collation sequence for a language. The paper discusses both theoretical linguistic and practical standardization and encoding related considerations that need to be addressed for languages for which relevant standards and/or solutions have not been defined. The paper also defines the process, by giving the details of the procedure followed for Urdu language, which is the national language of Pakistan and is spoken by more than 100 million people across the world. The paper is oriented towards organizations involved in developing and using collation standards and the localization industry, and not focused on theoretical issues.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview