Abstract: Panjabi (also referred to as Punjabi) is a name given to a collection of tonal languages originating in the Punjab area of South Asia. It is the ninth most spoken language in the world - roughly 1.9% of the world population. Panjabi is written in two scripts - Gurmukhi and Shahmukhi. Yet it can be considered a "low resource language" due to lack of basic building blocks of Natural Language Processing (NLP) research. Toshakhana is our attempt to build the first Panjabi corpus in Gurmukhi script with temporal component.
Loading