Measuring
the diversity of a single community
The
key point here is that, contrary to common belief, diversity
indices like the Shannon entropy ("Shannon-Wiener index")
and the Gini-Simpson index are not themselves diversities. They
have to be converted to effective numbers of species before
they can be treated as true diversities. Before reading these
examples, please read the page about
effective numbers of species. For a table of formulas
for converting common diversity indices to effective numbers
of species, see Table
1. For an Excel worksheet that does the conversion
for you, download Indices
to diversities.
Examples
Example 1: Let's take a community with 5 equally-common
species:
Species name: |
Species Frequency: |
Species A |
0.20 |
Species B |
0.20 |
Species C |
0.20 |
Species D |
0.20 |
Species E |
0.20 |
The species richness for this community is 5.0, the Shannon
entropy (Shannon-Wiener index) is 1.609, and the Gini-Simpson
index is 0.8. The Shannon-Wiener index and Simpson-Gini index
are hard to interpret, and the numbers are very hard to compare
with each other since they are all in different units (number
of species, bits per species, and probability). They should
be converted to effective number of species, which are the true
diversities. By referring to Table
1 we see that species richness is already a true
diversity, so the true diversity according to this index is
5.000 species. Table 1 shows that the Shannon entropy or Shannon-Wiener
index is converted to effective number of species or true diversity
by taking the exponential: exp(1.609) is 5.000, so the true
diversity according to the Shannon entropy is also 5.000 species.
The Gini-Simpson index is converted to a true diversity
by subtracting it from unity and inverting: 1/(1-0.8) = 5.000
species also. So in fact all these indices agree that the diversity
of this community is 5.000 species. That's because the community
is perfectly even, with no dominance. Note that true diversity
is always measured in units of number of species.
Example 2: If we did Example 1 again with
ten instead of five equally common species, we would get a species
richness of 10, a Shannon entropy of 2.305, and a Gini-Simpson
index of 0.9. It is intuitively reasonable to say that this
community of 10 equally common species is twice as diverse as
the community with 5 equally common species, but Shannon entropy
and the Gini-Simpson index of this community are not
twice that of the community of Example 1. That's because
they have not been converted to true diversities (effective
number of species). Their corresponding effective numbers of
species are: exp(2.305) = 10 effective species according
to Shannon entropy, and 1/(1-0.9) = 10 effective species according
to the Gini-Simpson index. These are twice that of
Example 1, in agreement with our intuition that this second
community is twice as diverse as the one in Example 1. True
diversities behave intuitively, unlike raw diversity indices.
Example 3: Now let's tackle a community with
uneven species frequencies.
Species name: |
Species frequency: |
Species A |
0.6 |
Species B |
0.4 |
Its species richness is 2.0, its Shannon entropy is 0.673,
and its Gini-Simpson index is 0.48. Comparing these is like
comparing apples and oranges. If we convert them to effective
numbers of species or true diversities, however, we can compare
them. Species richness is already a true diversity. Converting
Shannon entropy to effective number of species or true diversity
gives exp(0.673) = 1.96 effective species, and converting the
Gini-Simpson index gives 1/(1-.48) = 1.923 effective species.
Note that they are no longer equal. This indicates the degree
of unevenness or dominance in the community. When there is a
degree of dominance, the Shannon effective number of species
will be less than the species richness, and the Gini-Simpson
effective number of species will be less than the Shannon effective
number of species. The greater the dominance in the community,
the greater the differences between these three numbers.
This is useful information.
Species richness pays no attention to frequencies and just
counts presence or absence. The Shannon entropy weighs each
species exactly according to its frequency. The Gini-Simpson
index pays more attention to the most dominant species since
it involves the sum of the squares of the frequencies, and the
square of a very small frequency is a very very small number
(for example .01 squared is .0001). So uncommon species hardly
contribute to the sum. That's why the effective number of species
from the Gini-Simpson index will always be less than or equal
to the effective number of species from the Shannon-Wiener index.
That is also why the Shannon entropy is a fairer choice as a
diversity index, and why it arises naturally in almost every
science; it weighs species exactly by their frequencies, without
favoring rare or common species.
What do we mean when we say that a community has a diversity
of 15 effective species according to the Gini-Simpson index
(or any other index)? We mean that, according to the Gini-Simpson
index, the community has the same diversity as a community with
15 equally-common species.
Example 4: This example is from Hill (1973).
Let's take the community of Example 3 and divide each species
into two equal parts, say males and females. Let's consider
them as separate species, thus doubling the diversity. The frequencies
are:
Species name: |
Species Frequency: |
Species A male |
0.3 |
Species A female |
0.3 |
Species B male |
0.2 |
Species B female |
0.2 |
Intuitively, this community where we consider males and females
to be distinct species should be twice as diverse as the original
one. Here are the diversity indices and their effective number
of species before and after splitting each species into two
equal parts:
Raw diversity indices:
|
Value of diversity
index before splitting: |
Value of diversity
index after splitting: |
Ratio after/before
splitting |
Species richness |
2.0 |
4.0 |
2.00 |
Shannon entropy |
0.673 |
1.366 |
2.03 |
Gini-Simpson index |
0.48 |
0.74 |
1.54 |
Effective numbers of species:
|
Value of effective
number of species before splitting: |
Value of effective
number of species after splitting: |
Ratio after/before
splitting |
Species richness |
2.0 |
4.0 |
2.00 |
Shannon entropy |
1.96 |
3.92 |
2.00 |
Gini-Simpson index |
1.923 |
3.846 |
2.00 |
After splitting, the diversity indices are: species richness
= 4, Shannon entropy = 1.366, and Gini-Simpson index =.74. The
Shannon entropy and Gini-Simpson index of this community are
not twice that of Example 3. But the true diversities (the effective
numbers of species) for each index are exactly twice that
of the corresponding number from Example 3, as shown in
the table above. The true diversity according
to species richness went from 2 species to 4 species; the true
diversity according to Shannon entropy went from 1.96 effective
species to 3.92 effective species, and the true diversity according
to the Gini-Simpson index went from 1.923 effectivespecies to
3.846 effective species. This particular behavior, which
Hill called the "doubling property", ensures that
ratios of effective number of species behave reasonably. If
one community is twice as diverse as another (in the sense of
this example), the ratio of their effective numbers of species
is always 2.00, regardless of the index on which this ratio
is based. This is very different from the behavior of the ratio
of raw indices, which can behave very counterintuitively (as
we will see when treating similarity measures).
There are many diversity indices besides the ones we have used
so far. Applying some of the more exotic ones to this community,
I get a Simpson concentration of 0.26 and a Second-order Renyi
entropy of 1.347. Converting the Simpson concentration to effective
number of species, using the formula in Table 1, gives
3.846 effective species. The effective number of species of
the second-order Renyi entropy, using the formula in Table 1,
is the same, 3.846 effective species, and this is exactly the
same effective number of species we obtained using the Gini-Simpson
index. This is not an accident. Any diversity index that is
a function of the sum of the squares of the frequencies has
the same effective number of species for a given community.
So really all these indices are the same for this application.
This is the first hint of the dramatic unification that will
be the subject of Part 2. All indices that are functions
of the sum of the squares of the frequencies can be called "order
2 indices" and their effective number of species is the
"diversity of order 2". All indices
that are functions of the sum of the zeroth power of the frequencies
are "order 0 indices" and
their effective number of species is the "diversity
of order zero", which is species richness. The
effective number of species of the Shannon entropy (Shannon-Wiener
index) is the "diversity of order one".
I will use these terms often here.
Which index to use?
Deciding on an index to measure the diversity
of a single community is easier now that so many indices are
shown to be equivalent. The only real question is which order
of diversity should be used: zero, one, or two. (Higher-order
diversities exist but are seldom used.) We will see later that
the diversity of order one (the exponential
of the Shannon entropy) is the only diversity which can be consistently
decomposed into meaningful independent alpha and beta components,
so it should be the standard diversity measure. It also has
the advantage of favoring neither rare nor common species disproportionately;
it counts all species according to their frequency. It is therefore
the "fairest" index, weighting each species
exactly by its frequency in the sample. So for a general-purpose
diversity study, this is the proper choice. For calculating
regional alpha and beta it is the only choice. (It is also the
nearly-universal choice in all other sciences.)
Under what circumstances should we use the diversity
of order zero or order two? The generalized entropy formalism
introduced to biology by Keylock (2005) shows that if we are
especially concerned with the dominant species, we could use
higher order measures. The higher the order, the more the measure
emphasizes the commonest species. Because the diversity
of order two is derived from a well-known measure with
good sampling characteristics (Keylock 2005, Lande 1996), it
is the logical choice for such studies. Conversely, when the
rarest elements of a sample are as important as the commonest
elements (as for example in some conservation biology applications),
the diversity of order zero, species
richness, is a reasonable choice. Specially designed indices
may be needed for specific purposes, as noted by Hurlbert (1971).
When we are measuring only the diversity of a
single community, the trio of diversity of order zero (species
richness), diversity of order one (exponential of Shannon-Wiener
index) and diversity of order two (effective number of species
of any Simpson index) gives more information about the samples
than any single measure. In any study of a single community
it makes sense to give all three. That way readers can judge
the degree of dominance in the community by looking at the drops
between each one. The approach of Hill (1973), who uses a continuous
range of diversities and presents a graph of the results, is
even better because it gives a clearer graphical picture of
the degree of dominance in the community. However, I show elsewhere
that only diversity of order one (the Shannon
case) can be used when calculating alpha and beta diversities
of multiple, unequally-weighted communities.
The prejudice often expressed against Shannon measures, and
the frequent criticisms of them in the literature, are unfounded.
Some authors (e.g. Lande 1996, Magurran 2004) recommend the
Gini-Simpson index over Shannon entropy on the grounds that
the former converges more rapidly to its final value and has
an unbiased estimator. Sampling properties should not be the
primary criteria for choosing a measure. More important is the
measure’s ability to correctly capture the theoretical
concept being studied, and only Shannon measures correctly capture
the concepts of alpha and beta when community weights are unequal.
It does no good to have an unbiased, rapidly-converging estimator
of an index if that index doesn’t measure what one needs
to measure. The recent development of a nonparametric estimator
for Shannon entropy (Chao and Shen 2003) makes these
sampling criticisms even less relevent. This nonparametric estimator
for Shannon entropy converges rapidly with little bias even
when applied to small samples.
Another often-repeated criticism of Shannon measures is that
they have no clear biological interpretation. Shannon entropy
does in fact have an interpretation in terms of interspecific
encounters (Patil and Taillie 1982), and both H_Shannon and
exp(H_Shannon) can be related to characteristics of species
keys (see Effective
number of species) and to biologically reasonable
notions of uncertainty (Shannon 1948) and average rarity (Patil
and Taillie 1982). Nevertheless, as with the sampling issues,
this criticism is really irrelevant. If alpha and beta diversity
are being studied, one chooses the measure that best captures
the notions of alpha and beta, and only Shannon measures can
be decomposed into meaningful independent alpha and beta components
when community weights are unequal.
Some of the same authors who are critical of Shannon measures
because of their sampling properties (e.g. Magurran 2004; see
my review of this book here)
recommend species richness and its associated similarity and
overlap measures, the Jaccard and Sorensen indices. These measures
have much worse sampling properties than Shannon measures (Lande
1996, Magurran 2004). Since they are completely insensitive
to differences in species frequencies, they are poor choices
for distinguishing communities or comparing pre- and post-treatment
diversities. Real communities almost always have rare vagrants,
but these measures give them the same weight as dominant species
in calculating the similarity or overlap of two communities.
Many authors know these shortcomings of order 0 measures but
use them anyway, even when frequency data is available, because
of a generalized mistrust of frequency-based diversity and similarity
measures. As shown here, that mistrust of traditional diversity
measures was justified, but frequency data provide important
information that should be used when available. The new expressions
for alpha and beta remove the anomalies of the traditional definitions,
and the conversion of properly-defined frequency-based measures
to their numbers equivalents makes them linear with respect
to our intuitive ideas of diversity. They are now almost as
easy to interpret as species richness, and much more reliable
and informative. The same is true for similarity and overlap
measures; the Horn index of overlap (Eq. 23) is more informative,
discriminating, and reliable than either the Jaccard or Sorensen
indices.
While Fisher’s alpha is not strictly a nonparametric
index, it is sometimes used as if it were (Magurran 2004). However
there are strong reasons to avoid this index for general use.
When the data is not log-series distributed this index throws
away almost all the information in the sample (since it depends
only on the sample size and the number of species in the sample,
not the actual species frequencies) and gives uninterpretable
results. For example, a sample containing ten species with abundances
[91, 1, 1, 1, 1, 1, 1, 1, 1, 1]
has the same diversity, according to this index, as a sample
containing ten species with abundances
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10], whereas ecologically
and functionally the second community is much more diverse than
the first.
There are circumstances in which biologists should not convert
to effective number of species. If one is studying not diversity
but some other thing, such as the way that the probability of
intraspecific encounters varies between communities, or the
average uncertainty in identifying a species, then it makes
sense to use a measure that directly calculates the quantity
of interest. But this should not be confused with true diversity.