This website is dedicated to Natural Language Processing (NLP) and Computational Linguistic (CL) work on South and Southeast Asian Languages. Here you will find online systems for these languages, Computational Resources, a comprihensive contact list of people working on these languages, etc.
South and Southeast Asian Region and its Languages
South Asia comprises of the countries- Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan and Sri Lanka. Southeast Asia, on the other hand, consists of Burma, Cambodia, Laos, Thailand, Vietnam, Malaysia, Brunei, East Timor, Indonesia, Philippines and Singapore. The following table gives an idea about the size of population and the number of living languages in the regions of South and Southeast Asia.
Sr. |
Country |
Population |
Living Languages |
1 |
India |
1,134,403,000 |
438 |
2 |
Indonesia |
248,496,420 |
719 |
3 |
Pakistan |
158,081,000 |
72 |
4 |
Bangladesh |
153,281,000 |
42 |
5 |
Viet Nam |
85,029,000 |
106 |
6 |
Philippines |
84,566,000 |
171 |
7 |
Thailand |
63,003,000 |
74 |
8 |
Burma |
47,967,000 |
111 |
9 |
Nepal |
27,094,000 |
124 |
10 |
Malaysia |
25,653,000 |
137 |
11 |
Sri Lanka |
19,094,000 |
7 |
12 |
Cambodia |
13,511,970 |
23 |
13 |
Afghanistan |
12,164,970 |
52 |
14 |
Singapore |
4,327,000 |
21 |
15 |
Laos |
2,796,000 |
84 |
16 |
East Timor |
1,067,000 |
19 |
17 |
Bhutan |
637,000 |
25 |
18 |
Brunei |
374,000 |
15 |
19 |
Maldives |
359,000 |
1 |
Total |
|
2,081,904,360 |
2241 |
Source (Lewis, 2009) |
Table 1: Population and Number of Living Languages of South and Southeast Asia
The 2241 languages described in Table 1 belong to different language families like Indo-Aryan, Indo-Iranian, Dravidian, Sino-Tibetan, Austro-Asiatic, Kradai, Hmong-Mien, etc. In terms of population, South Asia and Southeast Asia represent 34.94% of the total population of the world. Some of the languages of these regions have a large number of native speakers: Hindi (5th largest according to number of its native speakers), Bengali (6th), Punjabi (12th), Tamil (18th), Urdu (20th), etc.
WHAT'S NEW
We are planning to organize the 7th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) that we have proposed as a collocated event at ACL 2019 .
Prof. Laurent Besacier, Director of MSTII Doctoral School and GETALP (Study Group for Machine Translation and Automated Processing of Languages and Speech), Grenoble Informatics Lab at Université Grenoble Alpes, France and Prof. Pushpak Bhattacharyya, Director of Indian Institute of Technology Patna, India has accepted to chair the 7th WSSANLP.
The 7th WSSANLP will be organized in Florence, Italy from July 28th to 2nd August, 2019.