Effective
number of species
Diversity
indices like the Shannon entropy ("Shannon-Wiener index")
and the Gini-Simpson index are not themselves diversities.
They are just indices of diversity, in the same way that the
diameter of a sphere is an index of its volume but is not itself
the volume. Using the diameter in place of the volume in engineering
equations would give dangerously misleading results. Things
would be even worse if some engineers liked to use surface area,
and if others liked to use circumference in place of volume.
Imagine the chaos if they called all of these things by the
same word and used them interchangeably in engineering equations
that required volume. This is what biologists are doing
with diversity indices.
Diversity indices have a wide variety of ranges and behaviors;
if applied to a system of S equally common species, some give
S, some give log S, some give 1/S, some give 1–1/S, etc.
Some have unlimited ranges while others are always less than
unity. By calling all of these indices “diversities”
and treating them as if they were interchangeable in formulas
or analyses requiring diversities, we will often generate misleading
results.
So what is a true "diversity"? What units should
it be measured in?
It is possible to arrive at a natural and intuitive definition.
In virtually any biological context, it is reasonable to say
that a community with sixteen equally-common species is twice
as diverse as a community with eight equally-common species.
This is so obvious that it seems odd to have to write it. But
it is important to realize what this simple statement implies.
Most diversity indices do not double as we go from
eight species to sixteen species. (Some biologists have noticed
this and concluded that all diversity indices other than species
richness are therefore not to be trusted. We will see below
that this is an incorrect conclusion. Species richness is the
least informative and most imprecise diversity index, in the
sense that it is more subject to random variation than any other
index. Frequency-based diversity indices tell us something important,
but they are not themselves "diversities".)
Going back to the obvious, it seems completely natural to say
that a community with eight equally-common species has a diversity
of eight species, or a community with S equally-common species
has a diversity of S species. This definition behaves as we
expect of a diversity; the diversity of a community of sixteen
equally-common species is double that of a community with eight
equally-common species. Diversity is an unambiguous concept
when we are dealing with communities of equally-common species.
What happens when the species aren't equally common? This is
where the choice of diversity index comes into play. If we choose
a particular index as our index of diversity, then any two communities
that give the same value of the index must have the same
diversity. Any two communities with a Shannon entropy (Shannon-Wiener
index) of 4.5 have the same diversity, according to this index.
We don't know what that diversity is yet (remember, 4.5 is just
the value of the index, not the real diversity) but we do know
that all communities with a Shannon-Wiener index of 4.5 have
the same diversity according to this index. Now if one of those
communities consisted of S equally-common species, we would
know that its true diversity is S by our above definition, and
then we would know that all other communities with a Shannon-Wiener
index of 4.5 must also have diversity S, even if their species
were not equally common.
It is a matter of algebra to find the number of equally-common
species that give a particular value of an index. See my paper,
Entropy and Diversity, for a description
of the algorithm. The number of equally-common species required
to give a particular value of an index is called the "effective
number of species". This is the true diversity
of the community in question. For example, the true diversity
associated with a Shannon-Wiener index of 4.5 is exp (4.5) =
90 effective species. The formulas that convert common diversity
indices into true diversities are collected in this table:
Table 1.
Converting indices to true diversities (effective numbers of
species) gives them a set of common behaviors and properties.
After conversion, diversity is always measured in units of number
of species, no matter what index we use. This lets us compare
and interpret them easily, and it lets us develop formulas and
techniques that don't depend on a specific index. It also lets
us avoid the serious misinterpretations spawned by the nonlinearity
of most diversity indices. For more details see What
is diversity? , the first chapter of a book on
diversity analysis that Dr. Anne Chao and I are writing under
contract for Chapman and Hall publishers.
As an example of the practical importance of this, suppose
you are comparing the diversity of aquatic microorganisms before
and after an oil spill. You wouldn't want to measure that diversity
by species richness because even a massive toxic event is sure
to leave a few vagrant individuals of each pre-spill species,
and species richness doesn't distinguish between one individual
of Species X or a million; the pre- and post-spill species counts
might not be very different, even if the pre- and post-spill
species frequencies are very different. So if you are a good
traditional biologist you might use the popular Gini-Simpson
diversity index, which is 1 - (Sum of the squares of species
frequencies). Suppose that the pre-spill Gini-Simpson index
is .99 and the post-spill index is .97. If you are a good traditional
biologist you would figure out that this drop is statistically
significant, but you would conclude that the magnitude of the
drop is small. You might even say (very wrongly) that
the diversity has dropped by 2%, which sounds like
a small drop, nothing to worry about.
The error which virtually all biologists make is that the Gini-Simpson
index is not itself a diversity, and is highly nonlinear. The
pre-spill community with a Gini-Simpson index of 0.99 has the
same diversity as a community of 100 equally-common species.
The post-spill community with a Gini-Simpson index of
0.97 has the same diversity as a community of 33 equally-common
species. The difference between the pre-and post-spill diversities
is in fact enormous. The drop in diversity is 66%, not 2%! This
is not just a matter of different definitions of diversity,
as some people would like to say. Rather, it is a matter of
the indices being nonlinear with respect to our intuitive concept
of diversity.
The Shannon entropy is also highly nonlinear. A Shannon entropy
of 6.0 corresponds to 403 equally-common species while a Shannon
entropy of 5.5 corresponds to 244 equally-common species. The
former is almost twice as diverse as the latter even though
the difference in the values of the indices is only 8%.
There may be times when we really want to know how the information
content of communities, and in that case we would use the Shannon
entropy directly. Similarly there might be times when we really
want to use the Gini-Simpson index directly. But when we are
doing diversity analyses, we have to convert them to true diversities
if they are to serve their purpose.
When we convert to true diversities (effective number of species)
we create a powerful and intuitive tool for comparing diversities
of different commuities. If one community has a true diversity
of 5 effective species based on some diversity index, and another
has a true diversity of 15 effective species based on the
same diversity index, we can truly say that the second community
is three times as diverse as the first according to that index.
We couldn't draw this conclusion from the raw index itself,
because it uses a nonlinear scale.
Other sciences have long ago recognized the importance of the
true diversity of a diversity index, though the concept goes
by different names in different fields. The use of the exponential
of Shannon entropy, exp(H_Shannon), in thermodynamics dates
from the dawn of the modern atomic theory of matter over a hundred
years ago; in that field it gives the number of equally-likely
states needed to produce the given entropy. Economists have
also long made this fundamental distinction; the term “numbers
equivalent” for the effective number of elements of a
diversity index is used in that field (Patil and Taillie 1982).
The distinction between Shannon entropy and its numbers equivalent
or true diversity can be visualized by imagining a dichotamous
key to the species of a community. Shannon entropy is proportional
to the mean depth of the maximally-efficient dichotamous key
to the species of the community (the average number of yes-or-no
questions that must be asked to identify a species), but the
true diversity is the effective number of terminal branches
in the key, and that number increases exponentially with the
depth of the key. Several biologists, notably MacArthur (1965),
Hill (1973), and Peet (1974), correctly identified diversity
with exp(H_Shannon), the effective number of species, but authors
of influential standard texts such as Magurran (2004) did not
recognize the significance of this, and the concept is seldom
used. Yet the results presented here show that this concept
clears up most of the many problems in diversity analysis in
biology, just as it does in physics and economics.
To get a feel for this and to learn some of the mathematical
properties of effective numbers of species, see the examples
I present in Measuring
the diversity of a single community and
Comparing the diversities of two communities.