|Hindi Urdu Machine Transliteration|
|Important:First Workshop on South and Southeast Asian Natrual Language Processing|
"One man's Hindi is other man's Urdu" (Rai, 2000). The major difference between Hindi and Urdu is that the former is written in Devanagari script with a more Sanskritized vocabulary and the latter is written in Urdu script (derivation of Persio-Arabic script) with more vocabulary borrowed from Persian and Arabic. In contrast to the transcriptional difference, Hindi and Urdu share grammar, morphology, a huge vocabulary, history, classical literature, cultural heritage, etc. Hindi is the official language of India with 366 million native speakers. Urdu is the National and one of the state languages of Pakistan and India respectively with 60 million native speakers (Rahman, 2004). Following table gives an idea about the size of Hindi and Urdu.
Hindi and Urdu, being varieties of the same language, cover a huge proportion of world’s population. People from Hindi and Urdu communities can understand the verbal expressions of each other but not the written expressions. HUMT is an effort to bridge this scriptural divide between India and Pakistan.
Malik, M G Abbas; Boitet, Christian; Bhattacharyya, Pushpak. 2008. Hindi Urdu Machine Transliteration using Finite-state Transducers. in proceedings of the 22nd International Conference on Computational Linguistics, August 18 - 22, 2008, Manchester, UK. pdf
Rahman, Tariq. 2004. Language Policy and Localization in Pakistan: Proposal for a Paradigmatic Shift. Crossing the Digital Divide, SCALLA Conference on Computational Linguistics.
Rai, Alok. 2000. Hindi Nationalism. Orient Longman Private Limited, New Delhi
|Back on Top|