Dec 22, 2013

On the Mutual Languages of States

Languages lie beyond the boarder of nations. Normally, if two states speak the same language, cultural connection is implied. The notable example is the US. and UK, both are built by English speaking people. In the real world, one state may speak various languages. Each language is spoken by some portion of the population. To model the language similarity between two states, we introduce a function $\phi(state1, state2)$ such that: 1) $\phi = 1$ if state1 = state2; 2) $\phi(state1, state2) = \phi(state2, state1)$; 3) if no mutual language spoken between state1 and state2, then $\phi(state1, state2) = 0$.

A very intuitive definition of $\phi$ could be the fraction of people speaking any mutual language of two states. Let $p_1, p_2$ be the population of the two states, respectively. $f_i^{(1)}, f_i^{(2)}$ be the population fractions of the  $i$-th mutual language. Then
\[ \phi(state1, state2) := \frac{p_1 \sum_i f_i^{(1)} + p_2 \sum_i f_i^{(2)}}{p_1+p_2}.
\] For example, the mutual native languages spoken in the US and UK are English and Angloromani. In the US, the fractions of these two languages are: 89% and 0.043%. In the UK, the numbers are 98% and 0.16%. Taken into account the population of the two states, 308.476 millions and 61.899 millions, the similarity function $phi$ yields 0.9085.

Another interesting definition of $\phi$ (but 1). is not satified) would be the probability of two men from the two states respectively speak the same tongue.  It can be shown the probability is \[
\phi(state1, state2) := \sum_i f_i^{(1)} f_i^{(2)}.
\] In the case of US and UK, $\phi$ yields 0.8722.


With this definition, let's explore the international language connection. Fig. 1 shows the adjacent matrix of Asia states. Fig. 2 shows the graph representation of it wit the edge colored according to the language similarity.

Fig. 1, the language connection among Asian states: the adjacent matrix.

Fig. 2, the language connections among Asian states: the graph (We have removed the self-loops). The edges are colored according to their values.
Fig. 3 again shows the language connections among Asian states. But the edges are dyed by the type of the primary mutual language shared by the two vertex states. The red edges in Fig. 3 represent English. Compare with Fig. 2, English is not very popular in Asia but still widely used.
Fig. 3, the language connections among Asian states: the graph edges are colored according to the type of primary mutual languages shared by each pair of states. 
Notice that some states enjoy closer language relation with their neighbours than others. We can define a language capacity of a state within some group: \[
LC(s) := \frac{1}{|G|-1}\sum_{s' \in G, s'\ne s} \phi(s, s')
\] Fig. 4 shows the language capcities of Asian states.

Fig. 4, the Language Capacity of the states in Asia. The two countries that have the largest two language capacities are China and Singapore.
We can gather further information from the language relation graph. We can split the vertices into graph communities. Using the FindGraphCommunities function in Mathematica, we can identify 7 communities. The first community is Southeast Asia including China, Singapore, Brunei etc. The most popular language is Chinese, the most widely used language is English. The second and third communities are the crossroad countries including Russia, the central asia stans, Iran, Iraq, Syria, Isreal etc. The most popular and most widely used languages include Russia and Kurdish. The fourth zone is south asia, including India, Pakistan, Bangladesh Nepal Bhutan and Myanmar. The most popular language is Hindi and the most widely used language is Tibetan. The fifth region is the Arabic states. The most popular and most widely used language is Arabic. Next region is Japan and the Koreas. The most popular language is Japanese and the most widely used language is Korean. The last region is Turkey and Uzbekistan. Both countries speak Turkish.

Fig. 5, the language communities of the Asia states. The edges are colored by the communities.


Europe is another continent with flourish civilizations. The average language capacity in Europe is higher than in Asia.

The languages spoken in Asia may be very diverse, it is not true in other continent. Europe, for example,  is dominated by French, German, Italian, English and Romanian etc.
Finally, the world is dominated by four languages: English, French, Spanish and Portuguese - they may not be the most popular ones - they are widely used in communicating with other states. Apparently, a language beyond the boundary of a country is a proof of the cultural influence. As we know, the four dominant "foreign languages" are the results of the colonism.

No comments:

Post a Comment