For example, Figure 1 illustrates the relationship between geographic range area and body size of 46 species of mid-western fishes studied in the Cimarron River, Oklahoma. Each point in this graph represents a single species. This data set is available to you in the Tutorial Data Sets folder (Midwest fishes.txt) and will be used to illustrate the features of EcoSim's macroecology module.
![]() | Figure 1. Relationship between body size (= standard length) and area of geographic range for 46 species of midwestern fishes. Data from Gotelli and Taylor (1999a). The dashed line indicates a potential boundary, suggesting that species with large geographic ranges and large body sizes are perhaps uncommon. |
There appear to be relatively few points in the upper right hand corner of the plot. In otherwords, species with large geographic ranges tend not to be large in body size. Or do they? Suppose there were no ecological or evolutionary constraints on range size and body size. What would the data in Figure 1 look like? EcoSim provides you with a number of simulation tools for answering this question. Although macroecology emphasizes the holistic nature of such data sets, we think it is essential that patterns such as Figure 1 be tested against an explicit null hypothesis (Blackburn et al. 1990, Enquist et al. 1995).
As in all EcoSim analyses, we can simulate patterns such as Figure 1 by randomizing an original data set. In the context of macroecology, these randomizations assume that, in a null community, the traits of species are independent of one another (Blackburn and Gaston 1998). But in reality, closely related species have similar traits by common ancestry, and they may not represent independent data points for the purposes of statistical analysis. The comparative method (Harvey and Pagel 1991) seeks to address these problems by mapping species traits onto phylogenetic trees and then using methods such as phylogenetic regression (Garland et al. 1993) to adjust for non-independence.
However, there are some limitations to the phylogenetic approach. The most serious is that good phylogenies are still not available for many taxa, although this is changing rapidly with the widespread availability of good molecular sequence data. Phylogenetic analysis rests on assumptions about the mode of character evolution, which may be difficult to justify for many ecological characters. Finally, phylogenetic "corrections" may not uniquely remove the historical factors that can lead to ecological correlations, and could even remove some of the pattern we're looking for. On a more practical level, the results of many phylogenetic analyses may be similar to those ignore the non-independence of species (Ricklefs and Starck 1996). All of the tests provided in this module assume the statistical independence of species.
Although the tests in this module are illustrated with macroecology data, the analytical problems are much broader than this. Other authors have studied the problem of detecting patterns in two-dimensional graphs when there is an "upper bound" or "factor ceiling" in operation. Conventional tests such as linear regression may not always reveal these boundaries or ceilings. Sophisticated alternatives include polynomial regression (Blackburn et al. 1990), quantile regression (Blackburn et al. 1992; Scharf et al. 1998; Cade et al. 1999), path analysis (Thomson et al. 1996), and other techniques (Garvey et al. 1998). The tests we present are applicable to the general problem of detecting non-random patterns in bivariate scatterplots of data (Figure 1).
To keep things simple in two-dimensional space, we have restricted this module to the analysis of non-negative real numbers. This means that all of the data points when plotted will fall in the upper-right hand quadrant of cartesian coordinate space. EcoSim does not carry out transformations, so if some of your data are negative (say, from a logarithmic transformation), be sure to add a constant so that all the values are positive. Although EcoSim will accept a large input matrix, only two columns of data are analyzed at a time. EcoSim lets you specify which column represents the x-variable and which column represents the y-variable.
As in all EcoSim modules, the first column is reserved for species names, and the first row is reserved for site names. See Importing Data for restrictions on species and site names.
Symmetric | Asymmetric |
---|---|
![]() |
![]() | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() | Figure 2. Symmetric and asymmetric shapes generated by EcoSim. Asymmetric shapes are generated by using the median x and median y as inflection points. The panels illustrate the left triangle, right triangle, pyramid, and inverted pyramid shapes. |
For the symmetric option, the four boundaries (upper right, lower right, upper left, and lower left) are found by connecting the four midpoints of the x and y variables [(min x, (max y + min y)/2), ((max x + min x)/2, max y), (max x, (max y + min y)/2), and ((max x + min x)/2, min y)], forming a symmetrical "diamond" in the data space.
The asymmetric option gives a slightly more complex pattern. This option takes into account the fact that the distribution of x and y values may not be symmetric, so that the simple triangles and pyramids may give a distorted impression of where the data points lie. The asymmetric option uses the median x and median y values to define the shapes. For the pyramid shapes, the point of the pyramid is set at the median value of x, whereas in the symmetric option the pyramid points occur at (max + + min x)/2. For the triangle shapes, EcoSim now creates a 4-sided polygon with an "inflection point" at (median x, median y).
For the asymmetric option, the four boundaries (upper right, lower right, upper left, and lower left) are found by connecting the four median points of the x and y variables [(min x, med y), (med x, max y), (max x, med y), and (med x, min y)], forming an asymmetrical "kite" in the data space.
All of this is a lot easier to understand by looking at some pictures!
Figure 2 uses the same data from Figure 1 to illustrate the four data shapes with the asymmetric and symmetric data options. Figure 3 illustrates the 4 boundaries that are created in each corner of the data space for the symmetric and asymmetric data options.
Symmetric | Asymmetric |
---|---|
![]() |
![]() | Figure 3. Symmetric and asymmetric boundaries generated by EcoSim. Symmetric boundaries are generated by using (max x + min x)/2 and (max y + miny)/2 as cut points. Asymmetric boundaries are generated by using the median x and median y as cut points. Each panel illustrates the 4 boundaries (upper right, upper left, upper left, lower right, and lower left) that can be selected with the symmetric and asymmetric data options. |
EcoSim offers you three options for constructing the null distribution of x and y values.
1) Data-defined This is the simplest (and best) option, and the one that EcoSim uses for a default. To create the null data sets, EcoSim simply reshuffles the ordering of the y values, randomly pairing them with the x values. This option retains the variances and distributions of the original x and y variables, but eliminates any pattern in the covariance of x and y together.
2) User-defined (Uniform) When this option is chosen, a small edit box appears, and the user enters a minimum and a maximum for both the x and y variables. For both the x and y variables, EcoSim creates random uniform values that are greater than the specified minimum and less than the specified maximum. The specified maximum must be greater than the specified minimum, and both the maximum and minimum values must be greater than zero. The "defaults" that appear in the edit window are the observed data limits themselves.
3) User-defined (Normal) For this option, a dialog box opens and the user is asked to supply the mean and the standard deviation of a normal distribution for the x and y variable. Non-negative real numbers are required for all 4 of these values. EcoSim then draws random values from these distributions to supply the x and y observations. Using the normal distribution, EcoSim may sometimes generate negative values, especially if the requested mean is small and/or the standard deviation is large. If this happens, EcoSim will discard the negative value and draw another observation until it gets a non-negative number. Therefore, the actual mean and variance of the simulated distribution may be different from the values specified in the dialog box. The "defaults" that appear in the edit window are the observed means and standard deviations of the x and y variables.
Select one of the 4 corners for the boundary test of your data. The darkened corner of the icon indicates the corner of the space that is being tested with the boundary test. EcoSim will provide you with two tests of the pattern associated with that boundary: the number of points that fall beyond the boundary, and the sum of squares of those points. If some corners of the space are unusually empty, the observed number of points and/or the sum of squares in the real data set will be significantly less than in the simulated data sets.
Our test is similar to a "range restriction" test developed by P. Wilson (unpublished manuscript), which is described by Thomson et al. (1996).
Because of the large number of tests and graphical displays, there are no less than 11 output tabs for a single run of Macroecology! These should appear as two rows of tabs on your screen, and you should work through them in the following order.
![]() | Figure 4. Quadrants for Midwestern fishes data. The four quadrants are defined by the position of the point (median x, median y). EcoSim calculates the variance in the number of points that fall in each quadrant. |
EcoSim next counts the number of points that occur in each quadrat (ties are scored as half or quarter points). The variance of these 4 numbers is calculated as a simple index of dispersion. If the original data are randomly distributed with covariance that is close to zero, then the observed variance will be similar to the variance that is calculated for the simulated data sets. On the other hand, if points are unusually concentrated in some corners of the space, the null hypothesis will be rejected, and the observed variance will be significantly larger than expected. Finally, if the points are distributed very evenly among the four quadrats, the observed variance will be significantly smaller than expected. This test is similar in spirit to the two-dimensional K-S test described by Garvey et al. (1998).
![]() | Figure 5. Regression charts tab for Midwestern fishes data. The upper panel shows the regression slope (red line) for the observed data set. The lower panel shows one of the simulated data sets, and the average regression line for all of the simulations. |
![]() | Figure 6. Boundary charts tab for Midwestern fishes data. The upper panel shows the observed data set, and the lower panel shows one of the simulated data sets. In both panels, the red line is the upper right boundary. |
Next, it presents the information that was contained in each of the histogram tabs for this module: dispersion, regression slope, shape # of points, shape sum of squares, boundary # of points, and boundary sum of squares. For each simulation, the summary window shows the observed and expected metric, the probability value, and the histogram bins for the simulated data sets. It also gives the standardized effect size, calculated as: observed index - mean(simmulated indices)/standard deviation(simulated indices)
This metric is analagous to the standardized effect size that is used in meta-analyses (Gurevitch et al. 1992). It scales the results in units of standard deviations, which allows for meaningful comparisons among different tests. Roughly speaking a standardized effect size that is greater than 2 or less than -2 is statistically significant with a tail probability of less than 0.05. However, this is only an approximation, and it assumes that the data are normally distributed, which is often not the case for null model tests. For any individual study, you should always report the actual tail probability, which is calculated directly from the simulation, and does not require any assumptions about normality of the data.
Finally, the summary tab shows the original data matrix, with labels.
All of these data can be edited, deleted, or annotated. The output can then be saved (Save to File) or discarded (Close). There is also a small time clock in the lower right-hand corner so you can tell how long your simulation took.
But even at this point, it is possible to say a few things about how the test should be used. One of the most important issues is the randomization algorithm to be used. Unless you have a good reason to do otherwise, we strongly recommend that you retain the default option of the data-defined constraint. This creates a null data set by simply reshuffling the observed values of x and y. The strength of this approach is that it retains the variances of x and y, so that any significant results are due to patterns in the covariance of the two variables. Outliers and asymmetric data distributions are fully retained with this option.
We have included the user-defined normal and uniform options in case you have other null expectations for the x and y variable. However, we caution that these distributions can easily generate patterns that are very different from those in the original data set, and will often lead to the rejection of the null hypothesis. It is interesting that the uniform distribution is probably the implicit null hypothesis that people use when evaluating macroecology scatterplots, because it implies that the variable space should be randomly and uniformly "filled". However, some parts of the macroecological space may be rare simply because the density of both x and y variables is low in that region, not because the joint distribution of x and y is unfavored.
Of the tests that are presented, the dispersion test and regression slope are the most general tests for non-randomness in the covariance of x and y. They are a good starting place for evaluating the distribution of the your data, and many times they may be highly non-random even though the other boundary and shape tests are not.
The results of the boundary and shape sum of squares tests need to be carefully interpreted because they will reflect not only the placement of the points relative to the boundary, but also the number of points within the shape or beyond the boundary. Finally, we note that the boundary tests may often give uninformative results if they are applied indiscriminantly to all of the corners of the distribution, particularly when there appears to be a three-sided "triangle" shape. We recommend that you examine the major shapes in your data first, or, even better, establish a-priori hypotheses about shapes and boundaries from the theoretical literature (Brown 1995, Maurer 1999).
Each row of the data set gives a different species of fish. The macroecological variables (= columns) in this data set are:
FRACT 10 sites on the Cimarron River were censused between 1976 and 1988, and this variable is the average fraction of sites occupied each year.
EXT The average annual probability of extinction for an occupied site.
COL The average annual probability of colonization for an unoccupied site.
DIST The distance in km from the center to the edge of the species geographic range.
AREA The area of the geographic range in km2.
SIZE The standard length in mm, a convenient measure of body size for fishes.
EDGE An index of the position of the sites on the Cimarron River relative to the edge of the geographic range. The larger the index, the closer the sites are to the edge of the geographic range. See Gotelli and Taylor (1999a) for details.
ABUN The average abundance of each species in occupied sites.
For this tutorial, let's examine the relationship between body size and geographic range area, illustrated in Figures 1-3. Select AREA as the x variable and SIZE as the y variable. Go to the "general" tab and set the random number seed to 10 so that your results will exactly match those in this tutorial.
We are initially interested in whether or not there is a left "triangle" pattern as shown in Figure 1, so we will keep the defaults, which specify a symmetric left triangle, with a boundary test for the upper right-hand corner.
The dispersion tab counts the number of data points in each of the 4 quadrants of the sample space (Figure 3) and calculates the variance of those data points. The observed variance was 27.0, whereas the average of variance of the 1000 simulated data sets was only 3.85. The tail probability (shown in the lower panel) for observed variance is 0.021. These results suggest that the points are not randomly distributed in the two-dimensional space: some quadrats in the space have too many points and others have too few compared to the randomized data sets.
The regression slope tab shows gives a standard regression slope of 0.00004, which is not significantly greater than the simulated slope of 0.00 (p = 0.127). The regression charts tab confirms visually that the slope of the observed data is positive, but not an extreme value. The shape # of points tab indicates that 41 of the 46 data points fell within the (symmetric) left triangle shape. This does not differ significantly from the average of the simulated values (41.20; p = 0.751). If the observed data points were unusually concentrated in the triangle, then the simulated data sets would usually have contained substantially fewer than 41 points in the triangle.
The shape sum of squares test, suggests that the observed sum of squares is larger than simulated, but not significantly so (p = 0.063).
Finally, the boundary # of points and the boundary sum of squares tests confirm that the upper right-hand corner of the space is not unusually "empty" even though there is only a single observation in that region of the space (# of points p = 0.964; sum of squares p = 0.790). Because there are few species with large geographic ranges and few species with large body sizes, we don't expect many species to be occuring in this corner of the space and should not be puzzled by their absence. In fact, if there is any pattern in the shapes of these data, it is in the right triangle. Run the simulation again with for the "right triangle shape". Thirty six points fell within the symmetric right triangle, compared to an average of only 32.57 points for the simulated data sets (p = 0.045).
Non-randomness is also indicated in some of the boundary tests. If you test each of the 4 corners (upper right, lower right, upper left, and lower left), you will discover that there is an odd distribution of points in the lower regions of the graph. Although observed number of points (25) in the lower left-hand corner of the graph is not unusual, the observed sum of squares (7.52 x 106) is much greater than expected (p = 0.006): the observed points are a bit "too close" to the origin. Conversely, there was only 1 data point in the lower right-hand corner of the graph, and this was significantly fewer than expected (expected = 3.02, p = 0.044). These patterns are probably responsible for the significance of the dispersion test.
When you carry out these analyses, nearly all of the statistical tests give significant results. If you examine the chart tabs, you will see that the observed data look quite different from the simulated data sets, for which the sample space is fairly evenly filled with data points.
Another variation is to use the user-defined normal option, which uses a normal distribution to draw the x and y values. As before, EcoSim pops up an edit window let you specify this distribution. EcoSim conveniently calculates and provides you with the observed means and variances as defaults.
Compared to the data-defined, the normal distribution also leads to a frequent rejection of the null hypothesis, but not as often as with the uniform distribution. The reason is that the normal distribution leaves the four corners of the space relatively sparse, because points are generated less frequently in the tails of the x and y distributions. Again, examine the chart tabs to see how these simulated data sets look compared to the actual data set.
Symmetric | Asymmetric | |||||
---|---|---|---|---|---|---|
Index | Data-Defined | Uniform | Normal | Data-Defined | Uniform | Normal |
Dispersion | + | + | + | + | + | + |
Regression | ns | ns | ns | ns | ns | ns |
Left Triangle (#) | ns | +++ | + | ns | ns | ns |
Left Triangle (ss) | ns | +++ | +++ | ns | ++ | + |
Right Triangle (#) | + | +++ | ++ | ns | ns | ns |
Right Triangle (ss) | ns | ns | ns | ns | --- | --- |
Pyramid (#) | ns | +++ | + | + | +++ | ++ |
Pyramid (ss) | ns | +++ | + | ns | +++ | +++ |
Inverted Pyramid (#) | ns | --- | -- | ns | ns | ns |
Inverted Pyramid (ss) | ns | --- | ns | ns | ns | ns |
Upper Right (#) | ns | - | ns | ns | - | ns |
Upper Right (ss) | ns | - | ns | ns | ns | ns |
Upper Left (#) | ns | ns | ns | ns | ns | ns |
Upper Left (ss) | ns | ns | ns | ns | ns | + |
Lower Right (#) | - | - | ns | ns | ns | ns |
Lower Right (ss) | ns | ns | ns | ns | --- | ns |
Lower Left (#) | ns | +++ | +++ | ns | + | ++ |
Lower Left (ss) | ++ | +++ | +++ | ++ | ns | ns |
Table 1. Summary of null model macroecology tests for the relationship between body size and geographic range area (Figure 1). # = number of points; ss = sum of squares; ns = non-significant (p > 0.05). Plus symbols (+, ++, +++) indicate the observed index was significantly greater than expected; Minus symbols (-, --, ---) indicate the observed index was significantly less than expected. One symbol = p < 0.05; two symbols = p < 0.01; three symbols = p < 0.001. |
Some general results are apparent from these comparisons. The first is that the overall distribution of points is non-random, although the conventional regression slope is not signficantly different from zero. Compared to a uniform distribution, the observed data set seems to fit the triangle or pyramid distributions. Using the more realistic data-defined distribution, the null hypothesis is rarely rejected for the shape and boundary tests, although there does appear to be a weak clustering of points within the pyramid, or the right triangle, depending on the symmetry option chosen.
Thus, there does not seem to be a simple "evolutionary boundary" (Figure 1) in which combinations of large geographic range and large body size are uncommon. If anything, there is a clustering of large body sizes at intermediate (pyramid) or large (right triangle) geographic ranges. Most of the tests give a significant sum of squares for points in the lower left-hand corner. In other words, there is a slight excess of species that have especially small geographic ranges and small body sizes. These patterns are subtly different from one in which there are too few species with large range and large body sizes.
Blackburn, T.M. and K.J. Gaston. 1998. Some methodological issues in macroecology. American Naturalist 151: 68-83.
Blackburn, T.M., J.H. Lawton, and J.N. Perry. 1992. A method of estimating the slope of upper bounds of plots of body size and abundance in natural animal assemblages. Oikos 65: 107-112.
Brown, J.H. and B.A. Maurer. 1989. Macroecology: the division of food and space among species on continents. Science 243: 1145-1150.
Brown, J. H. 1995. Macroecology. University of Chicago Press, Chicago.
Cade, B.S., J.W. Terrell, and R.L. Schroeder. 1999. Estimating effects of limiting factors with regression quantiles. Ecology 80: 311-323.
Enquist, B.J., M.A. Jordan, and J.H. Brown. 1995. Connections between ecology, biogeography, and paleobiology: Relationship between local abundance and geographic distribution in fossil and recent molluscs. Evolutionary Ecology 9: 586-604.
Garland, T., Jr., A.W. Dickerman, C.M. Janis, and J.A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42: 265-292.
Garvey, J.E., E.A. Marschall, and R.A. Wright. 1998. From star charts to stoneflies: detecting relationships in continuous bivariate data. Ecology 79: 442-447.
Harvey, P.H., and M.D. Pagel. 1991. The Comparative Method In Evolutionary Biology. Oxford University Press, Oxford.
Gotelli, N.J. and C.M. Taylor. 1999a. Testing macroecology models with stream-fish assemblages. Evolutionary Ecology Research 1: 847-858.
Gotelli, N.J. and C.M. Taylor. 1999b. Testing metapopulation models with stream-fish assemblages. Evolutionary Ecology Research 1: 835-845.
Gurevitch, J., L.L. Morrow, A. Wallace, and J.S. Walsh. 1992. A meta-analysis of field experiments on competition. The American Naturalist 140: 539-572.
Maurer, B.A. 1999. Untangling Ecological Complexity: The Macroscopic Perspective. University of Chicago Press, Chicago.
Pigg, J. 1988. Aquatic habitats and fish distribution in a large Oklahoma river, the Cimarron, from 1976-1988. Proceedings of the Oklahoma Academy of Sciences 68: 9-31.
Ricklefs, R.E. and J.M. Starck. 1996. Applications of phylogenetically independent contrasts: A mixed progress report. Oikos 77: 167-172.
Scharf, F.S., F. Juanes, and M. Sutherland. 1998. Inferring ecological relationships from the edges of scatter diagrams: comparison of regression techniques. Ecology 79: 448-460.
Thomson, J.D., G. Weiblen, B.A. Thomson, S. Alfaro, and P. Legendre. 1996. Untangling multiple factors in spatial distributions: lilies, gophers, and rocks. Ecology 77: 1698-1715.