Effective Number of Species is Diversity

Effective number of species

Diversity indices like the Shannon entropy ("Shannon-Wiener index") and the Gini-Simpson index are not themselves diversities. They are just indices of diversity, in the same way that the diameter of a sphere is an index of its volume but is not itself the volume. Using the diameter in place of the volume in engineering equations would give dangerously misleading results. Things would be even worse if some engineers liked to use surface area, and if others liked to use circumference in place of volume. Imagine the chaos if they called all of these things by the same word and used them interchangeably in engineering equations that required volume. This is what biologists are doing with diversity indices.

Diversity indices have a wide variety of ranges and behaviors; if applied to a system of S equally common species, some give S, some give log S, some give 1/S, some give 1–1/S, etc. Some have unlimited ranges while others are always less than unity. By calling all of these indices “diversities” and treating them as if they were interchangeable in formulas or analyses requiring diversities, we will often generate misleading results.

So what is a true "diversity"? What units should it be measured in?

It is possible to arrive at a natural and intuitive definition. In virtually any biological context, it is reasonable to say that a community with sixteen equally-common species is twice as diverse as a community with eight equally-common species. This is so obvious that it seems odd to have to write it. But it is important to realize what this simple statement implies. Most diversity indices do not double as we go from eight species to sixteen species. (Some biologists have noticed this and concluded that all diversity indices other than species richness are therefore not to be trusted. We will see below that this is an incorrect conclusion. Species richness is the least informative and most imprecise diversity index, in the sense that it is more subject to random variation than any other index. Frequency-based diversity indices tell us something important, but they are not themselves "diversities".)

Going back to the obvious, it seems completely natural to say that a community with eight equally-common species has a diversity of eight species, or a community with S equally-common species has a diversity of S species. This definition behaves as we expect of a diversity; the diversity of a community of sixteen equally-common species is double that of a community with eight equally-common species. Diversity is an unambiguous concept when we are dealing with communities of equally-common species.

What happens when the species aren't equally common? This is where the choice of diversity index comes into play. If we choose a particular index as our index of diversity, then any two communities that give the same value of the index must have the same diversity. Any two communities with a Shannon entropy (Shannon-Wiener index) of 4.5 have the same diversity, according to this index. We don't know what that diversity is yet (remember, 4.5 is just the value of the index, not the real diversity) but we do know that all communities with a Shannon-Wiener index of 4.5 have the same diversity according to this index. Now if one of those communities consisted of S equally-common species, we would know that its true diversity is S by our above definition, and then we would know that all other communities with a Shannon-Wiener index of 4.5 must also have diversity S, even if their species were not equally common.

It is a matter of algebra to find the number of equally-common species that give a particular value of an index. See my paper, Entropy and Diversity, for a description of the algorithm. The number of equally-common species required to give a particular value of an index is called the "effective number of species". This is the true diversity of the community in question. For example, the true diversity associated with a Shannon-Wiener index of 4.5 is exp (4.5) = 90 effective species. The formulas that convert common diversity indices into true diversities are collected in this table: Table 1.

Converting indices to true diversities (effective numbers of species) gives them a set of common behaviors and properties. After conversion, diversity is always measured in units of number of species, no matter what index we use. This lets us compare and interpret them easily, and it lets us develop formulas and techniques that don't depend on a specific index. It also lets us avoid the serious misinterpretations spawned by the nonlinearity of most diversity indices. For more details see What is diversity? , the first chapter of a book on diversity analysis that Dr. Anne Chao and I are writing under contract for Chapman and Hall publishers.

As an example of the practical importance of this, suppose you are comparing the diversity of aquatic microorganisms before and after an oil spill. You wouldn't want to measure that diversity by species richness because even a massive toxic event is sure to leave a few vagrant individuals of each pre-spill species, and species richness doesn't distinguish between one individual of Species X or a million; the pre- and post-spill species counts might not be very different, even if the pre- and post-spill species frequencies are very different. So if you are a good traditional biologist you might use the popular Gini-Simpson diversity index, which is 1 - (Sum of the squares of species frequencies). Suppose that the pre-spill Gini-Simpson index is .99 and the post-spill index is .97. If you are a good traditional biologist you would figure out that this drop is statistically significant, but you would conclude that the magnitude of the drop is small. You might even say (very wrongly) that the diversity has dropped by 2%, which sounds like a small drop, nothing to worry about.

The error which virtually all biologists make is that the Gini-Simpson index is not itself a diversity, and is highly nonlinear. The pre-spill community with a Gini-Simpson index of 0.99 has the same diversity as a community of 100 equally-common species. The post-spill community with a Gini-Simpson index of 0.97 has the same diversity as a community of 33 equally-common species. The difference between the pre-and post-spill diversities is in fact enormous. The drop in diversity is 66%, not 2%! This is not just a matter of different definitions of diversity, as some people would like to say. Rather, it is a matter of the indices being nonlinear with respect to our intuitive concept of diversity.

The Shannon entropy is also highly nonlinear. A Shannon entropy of 6.0 corresponds to 403 equally-common species while a Shannon entropy of 5.5 corresponds to 244 equally-common species. The former is almost twice as diverse as the latter even though the difference in the values of the indices is only 8%.

There may be times when we really want to know how the information content of communities, and in that case we would use the Shannon entropy directly. Similarly there might be times when we really want to use the Gini-Simpson index directly. But when we are doing diversity analyses, we have to convert them to true diversities if they are to serve their purpose.

When we convert to true diversities (effective number of species) we create a powerful and intuitive tool for comparing diversities of different commuities. If one community has a true diversity of 5 effective species based on some diversity index, and another has a true diversity of 15 effective species based on the same diversity index, we can truly say that the second community is three times as diverse as the first according to that index. We couldn't draw this conclusion from the raw index itself, because it uses a nonlinear scale.

Other sciences have long ago recognized the importance of the true diversity of a diversity index, though the concept goes by different names in different fields. The use of the exponential of Shannon entropy, exp(H_Shannon), in thermodynamics dates from the dawn of the modern atomic theory of matter over a hundred years ago; in that field it gives the number of equally-likely states needed to produce the given entropy. Economists have also long made this fundamental distinction; the term “numbers equivalent” for the effective number of elements of a diversity index is used in that field (Patil and Taillie 1982). The distinction between Shannon entropy and its numbers equivalent or true diversity can be visualized by imagining a dichotamous key to the species of a community. Shannon entropy is proportional to the mean depth of the maximally-efficient dichotamous key to the species of the community (the average number of yes-or-no questions that must be asked to identify a species), but the true diversity is the effective number of terminal branches in the key, and that number increases exponentially with the depth of the key. Several biologists, notably MacArthur (1965), Hill (1973), and Peet (1974), correctly identified diversity with exp(H_Shannon), the effective number of species, but authors of influential standard texts such as Magurran (2004) did not recognize the significance of this, and the concept is seldom used. Yet the results presented here show that this concept clears up most of the many problems in diversity analysis in biology, just as it does in physics and economics.

To get a feel for this and to learn some of the mathematical properties of effective numbers of species, see the examples I present in Measuring the diversity of a single community and Comparing the diversities of two communities.