Principal Investigators:
David Harmon, M.S.
Jonathan Loh, M.Sc.

Text:©Terralingua 2004

Executive Summary

Background
Methods
Results
Discussion
Conclusion
Appendix
Data: items pop up on new screen, for easier reference

--------

--------

maps

References

Full Text (.pdf)

 

A p p e n d i x:


MEASURING CULTURAL DIVERSITY—GREENBERG’S INDICES AND ELFs


Greenberg’s Linguistic Diversity Indices

In an influential paper published in 1956, the linguist Joseph Greenberg proposed eight indices of linguistic diversity. The indices are based on the probability of two members of a population sharing the same language. Greenberg identified the indices with the letters A through H. Briefly, the indices are as follows:

• The A Index (Monolingual Nonweighted Method) measures the probability that two randomly chosen individuals from a population in a given area will speak the same language.
• The B Index (Monolingual Weighted Method) modifies the A Index by adding a multiplier (the “resemblance factor”) to weight the index so that it registers a higher diversity for areas that have a number of unrelated or distantly related languages than for those having the same number of closely related languages. The resemblance factor is calculated by comparing a fixed basic word list for each language.
• The A and B indices are called “monolingual” because each individual is counted as a speaker of only one language, his or her dominant one. The C (Split-Personality Nonweighted Method) and D indices (Split-Personality Weighted Method) account for polylingualism by counting speakers of two languages as two people, of three languages as three people, and so on; the D Index is weighted in the same manner as in B.
• The E (Random Speaker Nonweighted Method) and F indices (Random Speaker Weighted Method) suppose that an individual is chosen from the population at random and that, if polylingual, it is equally probable that he or she will speak any one of the languages. A second speaker is chosen the same way. Their probability of their speaking the same language is the E Index; the probable measure of resemblance among their languages is the F Index.
• The G Index (Random Speaker–Hearer Method) measures the probability that a randomly chosen individual is likely to understand the language spoken by a second individual if that individual is polylingual and is equally likely to speak any of the languages at his or her command.
• The H Index (Index of Communication) is the probability that two randomly chosen members of a population will have at least one language in common.

While indices B through G are methodologically the most sophisticated and potentially the most descriptive of real-world conditions (especially in regions of the world where polylingualism is still prevalent), they present great difficulties in terms of obtaining the necessary data to calculate them on anything but the smallest scales. The B Index requires the calculation of a resemblance factor for each language in question through a series of comparisons of basic vocabulary lists. Greenberg himself acknowledged that the indices measuring polylingualism (C, D, E, F, and G) take no account of the individual’s relative command of each language he or she knows, and, in any event “satisfactory measures have not as yet been developed, and, if developed could hardly be applied on a scale which would allow them to be ascertained for an entire population” (Greenberg 1956 [1971:73]). Because of these considerations, we will disregard indices B through G and continue with a discussion of A and H only.

The A Index, again, measures the probability that two randomly chosen individuals from the population in a given area will speak the same language. It is calculated according to the following formula:

A = 1 – Σi (i2)

where i successively takes on the value of the proportion of speakers of a given language to the entire population of the area in question. The sum of these squared proportions is subtracted from 1 because we are measuring diversity rather than uniformity, and this makes lower values represent lower diversity. Hence, 0 represents the extreme situation in which everyone in a given area speaks the same language, and 1 the opposite situation, in which everyone in a given area speaks a different language. Greenberg gives a simple example. If, in a given area, one-eighth of the population speak language M, threeeighths speak N, and one-half speak O, and if the proportion of speakers of the three languages to the total population is designated m, n, and o, respectively, then the A value for that area is:

A = 1 – (m2 + n2 + o2)
A = 1 – (0.1252 + 0.3752 + 0.52)
A = 1 – (0.015625) + 0.140625 + 0.25) = 0.59375

which is conventionally truncated to three (or two) decimal places.5 In this example, then, A = 0.593.

The H Index, again, measures the probability that two randomly chosen members of a population will have at least one language in common. “This measure,” Greenberg notes, “indicates the actual possibility of communication among any two people taken at random. As such it is the most responsive of all to such phenomena as the spread of auxiliary [i.e., non-mother tongue] languages” (Greenberg 1956 [1971:72]). As a nonweighted index, H is obtained in the same general manner as A, except the population is now divided into proportions of speakers of any one language only or any particular combination of languages. Then the products of each pair of such proportions are calculated, and “the proportions are multiplied by 1 if there is at least one common language among the languages corresponding to each factor in the product.... The sum of these products is then obtained. In this case, it seems advisable not to subtract from 1: if we did, we would have an index of noncommunication” (Greenberg 1956 [1971:72]).

Greenberg gives this example. Let there be five languages, M, N, O, P, and Q. In a given area, one-fifth of the population speaks M only, one-tenth speaks N only, one-tenth speaks O only, one-tenth speaks P only, one-tenth speaks Q only, one-tenth speaks both M and N, one-tenth speaks both M and O, one-tenth speaks both M and P, and one-tenth speaks both M and Q. First, all the possible pairwise combinations of languages are calculated: Then, as noted above, each proportion is multiplied by 1 if there is at least one common language among the languages corresponding to each factor in the product, and the products are summed. When all is said and done, the H value in this example is 0.480 (Greenberg 1956 [1971:72-73]).

How do A and H indices compare when calculated from real-life data? Greenberg provides an example using 1930 census data from Mexico (Table A-1). In this example, the relationship between A and H is largely, but not perfectly, inverse. The five highestranking states on the A Index rank 30th, 26th, 24th, 29th, and 27th, respectively, on the H Index; while the five highest-ranking states on the H Index rank 28th and 27th, 24th, and 23rd and 30th, respectively, on the A Index.

Of the eight indices Greenberg proposed, the A Index has found the most favor because it is the simplest and most straightforward to calculate. Because it can potentially be derived from census data, the A Index offers the possibility of tracking changes in linguistic diversity over time. Lieberson and co-workers did just this in a study of 34 countries that used census data dating from 1880 to 1970 (Lieberson et al. 1975 [1981]). Table A-2 shows the results, and extends them by including the A Index calculated for these countries in the 2000 edition of Ethnologue (Grimes 2000). The three countries whose A increased the most over the period were Italy, Estonia, and Sweden, and the three whose A decreased the most were Poland, Romania, and Bulgaria. Lieberson et al. 1975 [1981] offered some hypotheses to explain this change.

The Greenberg A Index would seem, then, to be a good candidate for the linguistic diversity component of the IBCD. However, the matter is not so simple. It could be argued that the A Index actually gauges linguistic concentration rather than linguistic diversity. By measuring the probability that two individuals will speak the same language, the A Index is driven down in countries where one language dominates, even if there are many small languages also spoken there. Table A-3 illustrates this by listing the A Index for countries with 50 or more endemic languages (as calculated from data in Grimes 2000). The countries with the lowest A indices are those that are dominated by a single (usually colonial) language. Not surprisingly, these countries also tend to have the highest number of recorded endemic language extinctions. It could indeed be argued that the A Index for a particular country might be considered a rough indicator of the likelihood (or at least the risk) that its endemic languages could become extinct.

What this suggests is that the A Index accurately reflects endemic language richness only in countries where no single language predominates. In other words, the A Index does not capture the full linguistic richness of many countries that have small endemic languages. It may, on the other hand, capture the language richness of a linguistically heterogeneous country whose diversity derives from an influx of immigrant languages. Note in Table A-3 that the A values for Canada and the USA are relatively high compared with that of the third country in North America, Mexico. We might surmise that this is because Canada and the USA have much higher immigration rates from different linguistic source-communities around the world than does Mexico.

top


Rae and Taylor’s Cross-Cutting (XC) Index

In the terminology of political science, “cleavages” are lines of division within a community, whether they be religious, interest-group, voting, or any number of others. “Cross-cutting” is “the extent to which individuals who are in the same group on one cleavage are in different groups on the other cleavage” (Rae and Taylor 1970, 23, 92). Hypothesizing a simplified political situation in which there are two cleavages, x1 and x2, Rae and Taylor devised a Cross-Cutting (XC) Index that is expressed as follows:

XC =
A + B
N|(N| – 1) / 2

 

where A is the number of pairs whose members are in the same group of X1 but in different groups of X2, and B is the number of pairs whose members are in different groups of X1 but in the same group of x2. The total number of pairs is N|(N| – 1) / 2, with N| being the number of individuals in the overlap. Calculating A and B is a matter of constructing a matrix (called a “contingency table”) with the two cleavages and then summing all the pairwise comparisons across the rows and down the columns of the matrix, in a manner similar to that used to calculate H above.

The advantage of the XC Index is that it is a quantitative representation of an important phenomenon, for cross-cutting “can have crucial consequences for the intensity of feelings generated” among ethnic groups: political theory holds that the presence of cross-cutting will result in a moderation of attitudes and actions, as opposed to crossreinforcing cleavages, which tend to intensify them (Yeoh 2001, 13). But as Yeoh also points out, “it is practically impossible to measure such complex links” because XC would require a detailed field survey to determine the proportion of members of one type of ethnic group who also belong to other types of groups (Yeoh 2001, 13).

This is a point which can be generalized to all indices of cultural diversity at larger scales, e.g., the country level: namely, while it is possible to devise indices (both weighted and unweighted) that can mathematically represent the complex realities of ethnic, religious, and linguistic interaction that are the true reflection of cultural diversity, the computations required quickly become so complicated (when more than two or three indicators/cleavages are taken into account), and the data required to quantify the indicators/cleavages so difficult to gather, that the calculation of such complex indices is impractical. In other words, paradoxically, the only tenable way to represent a complex cultural reality is by means of a pared-down measure that borders on an oversimplification. In his book The Wellbeing of Nations Robert Prescott-Allen hints at the paradox in his discussion of the three criteria for an ideal indicator. The indicator must be representative—covering the most important aspects of what is being measured while showing trends over time and differences between places and groups of people. It must be reliable—based on accurate data that have been gathered with sound and consistent 39 sampling procedures. It must be feasible—based on data that are already available or which can be gathered at a reasonable cost (Prescott-Allen 2001, 280-281). Complex indices that show interactions between multiple factors, such as Greenberg’s B–G indices and Rae and Taylor’s XC Index, meet (or could conceivably meet) the first two criteria, but probably will fail badly on the third. Therefore, the major constraint on building an comprehensive cultural diversity index is the feasibility of gathering the data needed.

All this notwithstanding, the fact that we have to fall back on simple indices does not mean that these indices are worthless. Again, it is paramount to keep in mind the purposes of the IBCD (and other similar global-level indices) and to not overstate what they signify.

top

Ethnolinguistic Fractionalization Indices

Ethnolinguistic fractionalization indices (ELFs) are very similar to Greenberg’s diversity indices. As the name denotes, ELFs measure the probability that two people from a given area, chosen at random, will be from different ethnic groups. ELFs are calculated using a Herfindahl concentration formula (as it is known in political science):

where ni is the number of members of the ith group and N is the total number of people in the population. This formula is almost identical to Greenberg’s.

Using data from one of the 20th century’s landmark compendiums of culture-group data, the Atlas Narodov Mira, or New World Atlas (State Geological Committee of the Soviet Union 1964), Taylor and Hudson included an ELF for 129 countries in the first edition of their Handbook of Political and Social Indicators (1972). This ELF has subsequently seen wide use in the fields of political science and econometrics. Political scientists want to know what role ethnic diversity plays in decision-making, while economists focus on the effect of national-level ethnic diversity on economic growth and development. In political science analyses, a typical conclusion is that ethnic heterogeneity tends to hamper efficient government because the ethnic group(s) in power tend to focus on redistributing public goods to their compatriots rather than on making sound policy (e.g., La Porta et al. 1999; Collier 1999). In econometric analyses, a typical conclusion is that ethnic fragmentation explains the presence of social characteristics (such as low levels of formal schooling and political instability) that negatively affect economic growth (e.g., Easterly and Levine 1997).

In a critique of the use of conventional ELFs, Posner (2000) highlights a fundamental methodological issue that bears upon the IBCD. While not taking issue with the basic conclusions of the above-cited studies, Posner notes that the ELF is designed to measure, not ethnolinguistic diversity per se, but politically salient ethnolinguistic diversity. Yet the Atlas Narodov Mira data covers (or purports to cover) all ethnolinguistic groups, no matter how small. This leads to what Posner calls the “problem of inclusion”:

This is the problem of enumerating dozens of groups in each country that may be culturally or linguistically distinct from their neighbors but that are irrelevant as political actors in their own right. In some cases, this is because these groups fold themselves into broader political coalitions when it comes to competing over resources and national-level policy outcomes. In other instances, it is because they simply do not participate in politics as distinct, recognizable groups.... My assertion is not that the many ethnic groups included in the Atlas are unimportant per se. Rather, my claim is that these groups are unimportant for the political mechanism that the ELF measure is being used to test.... In all of these models [i.e., those of Collier 1999, La Porta et al, 1999, and Easterly and Levine 1997], the relevant competition is that between mobilized, ethnically-defined interest groups. To capture the contribution that a country’s ethnic heterogeneity makes to such a process requires an index of fractionalization that reflects the groups that are actually doing the competing. The problem with the ELF index is that it does not do this: it includes dozens of groups that are irrelevant to the process that it is employed to capture (Posner 2000).

As an alternative, Posner proposes a PREG (Politically Relevant Ethnic Group) Index that cuts out small groups that are considered either not distinct or not important political actors at the national level.

A second problem with ELFs, says Posner, is that they convey “no information about the depth of the divisions that separate members of one group from another.”

Yet this factor certainly matters. In practice, what shapes the public policies selected by governments is often not so much the number or comparative sizes of the ethnic groups in the political system as the depth of the divisions between them. To take one example, the probability of randomly choosing two people from different ethnic groups in Malaysia and Switzerland (or, according to the values in the ELF index, Israel and Kuwait!) is roughly equivalent. But few people would claim that the salience of ethnicity for explaining economic policy choices in these countries is comparable. By not capturing the depth of the ethnic cleavages, indices of ethnic fractionalization leave out a potentially important part of the explanation (Posner 2000).

Finally, ELFs assume that group political relevance is proportional to population share, which is rarely true, nor do they give a sense of shifts in political power over time.

Fractionalization Indices versus Diversity Indices

For our purposes, the important point Posner raises is that the aim of ELFs—of any fractionalization index—is not the same as the purpose of the IBCD. What Posner rightly criticizes as a shortcoming of ELFs in the context of political scientific or econometric analysis is precisely the opposite in the context of the IBCD: we want to include the full range of discrete cultural groups, whether or not they are “relevant,” “powerful,” or closely related in terms of their views and beliefs.

Posner gives a “glaring example of problematic inclusion” that illustrates the point. He notes that the Atlas Narodov Mira includes a miscellaneous category of “others and unknowns,” which is incorporated in the fractionalization calculation of every country in the Taylor-Hudson ELF dataset. “In most cases,” Posner notes, “this ‘group’ is so small relative to the rest of the population that its inclusion has little effect on the resultant ELF value. But in a handful of cases, the mistaken inclusion of ‘others and unknowns’ in the calculation does have a marked effect: in the Seychelles, for example, where the Atlas identifies just two groups (‘French’ and ‘others and unknowns’) [and] including ‘others and unknowns’ raises the country’s fractionalization index value from 0 to 0.33”. From the IBCD’s perspective, the inclusion of “others and unknowns” is anything but a mistake, and the ELF value of 0.33 is more valid to analysis of BCD than the value of 0.00.

In sum, we can say that fractionalization indices are fundamentally different from diversity indices in terms of what they are trying to accomplish. Fractionalization indices are designed to test the effects of diversity on cleavages and therefore concern themselves with establishing the differences between politically relevant ethnic groups. Probability indices do not give enough weight to broad diversity embedded in the small, politically powerless ethnolinguistic groups. Yeoh (2001) says that it is beneficial to use different criteria to calculate an ELF (which is what he does in calculating his Ethnic Fractionalization Index), whereas Posner criticizes doing that. For our purposes, Yeoh’s approach is right.

How one measures diversity depends on how one defines diversity. Fractionalization indices define diversity in terms of a probability of two people sharing a characteristic. That number will drop the more disproportionately a single characteristic dominates within a country. When one considers the imbalance between the great number of languages that are each spoken by only a small number of people, and the very few largest languages whose mother-tongue speakers comprise over 90% of the world’s people, then we have to ask how we can truly measure diversity in such a situation. Here, it seems that endemism is a key to reckoning diversity on a global scale, because numerically small endemic languages (or species) are equivalent to the largest, most widespread languages (or species) in terms of the distinct, unique contribution they make to overall “sum total” of diversity worldwide.

top

next >>

site map      home      about us      support us      projects       resources       forum       contact
Text © 1997-2009 Terralingua. All rights reserved.
Terralingua is a 501(c)(3) non-profit organization registered under U.S.A. tax laws (38-3291259).
Terralingua logo © 1998-2009 Fausto Bonasera and Anna Maffi.
Photographs © 2009 Cristina Mittermeier
, Anna Maffi, David Rapport
Website design by o r t i x i a.