Hindi Urdu Machine Transliteration
Important:First Workshop on South and Southeast Asian Natrual Language Processing
Transliteration System
Convert your Text

"One man's Hindi is other man's Urdu" (Rai, 2000). The major difference between Hindi and Urdu is that the former is written in Devanagari script with a more Sanskritized vocabulary and the latter is written in Urdu script (derivation of Persio-Arabic script) with more vocabulary borrowed from Persian and Arabic. In contrast to the transcriptional difference, Hindi and Urdu share grammar, morphology, a huge vocabulary, history, classical literature, cultural heritage, etc. Hindi is the official language of India with 366 million native speakers. Urdu is the National and one of the state languages of Pakistan and India respectively with 60 million native speakers (Rahman, 2004). Following table gives an idea about the size of Hindi and Urdu.

  Native Speakers 2nd Language Speakers Total
Hindi 366,000,000 487,000,000 853,000,000
Urdu 60,290,000 104,000,000 164,290,000
Total 426,290,000 591,000,000 1,017,000,000

Hindi and Urdu, being varieties of the same language, cover a huge proportion of world’s population. People from Hindi and Urdu communities can understand the verbal expressions of each other but not the written expressions. HUMT is an effort to bridge this scriptural divide between India and Pakistan.


Malik, M G Abbas; Boitet, Christian; Bhattacharyya, Pushpak. 2008. Hindi Urdu Machine Transliteration using Finite-state Transducers. in proceedings of the 22nd International Conference on Computational Linguistics, August 18 - 22, 2008, Manchester, UK. pdf

Rahman, Tariq. 2004. Language Policy and Localization in Pakistan: Proposal for a Paradigmatic Shift. Crossing the Digital Divide, SCALLA Conference on Computational Linguistics.

Rai, Alok. 2000. Hindi Nationalism. Orient Longman Private Limited, New Delhi

Back on Top