Lab 9. Geographic
Cluster Analysis
- This
week we’ll do spatial cluster analysis. First, go to
Data_2006\Database\Analysis\Analysis.mdb and export lowerGF_BG to a new
shapefile located in your temp directory. At the end of this you’ll transfer
this file back to your zoo account. Note, throughout this lab, if you have
any trouble understanding the methods, check out the help for the
software, or go to http://www.terraseer.com/help/boundaryseer/
- Open
Boundary Seer. Create a new project when it prompts you. Click Vector,
then browse to your newly exported shapefile from the last step and in the
next window choose “planar/projection” and click next. In the following
window, choose to import P_pavement and P-coarseveg as numeric variables
(shift click and hit the “numeric>>” button), import DESC_5, DESC_15
and DESC_62 as categorical and everything else below DESC62 as numeric.
Click next and then choose to put the map in a New Map. You can view numeric attributes by
double clicking on the TOC in the left on lowerbgNum (for numeric
variables) and then under “use single numeric variable” changing the
variable to view. You can do the same for categorical (PRIZM) variables by
clicking on “lowerbgCat.” Start by unchecking the categorical variable and
then changing the numeric variable being viewed to P_coarse_veg. Create
new color scheme as well within that window (like dark to light green).
Take a screencapture of the resulting map. Before going on, save the
project (file>>save project).
- Now
let’s try a simple clustering that differentiates neighborhoods based on
crime, income and vegetation. Click data>>detect
boundary>>constrained clustering. Before choosing the number of
clusters, we’ll do an optimization to help us choose. Choose your numeric data set as the Data
(probably called lowergfNum) Check “measure goodness of fit for multiple
partitions” and then choose a minimum of 3 and maximum of 15. Then hit Edit Variable Set and hit “create
new set.” Under the “in set” column check MED_HH_INC, p_coarseveg and
Robb2005. Then click close and yes to the box asking if you want to save
those changes. Make sure your “detect boundary using” is set to “variable
set: varset1.” Click ok. Choose to standardize using Z scores. Click OK.
Now look at the graph and screencapture. You should note that the highest
fit value is 5 classes, but the number where the highest jump occurs is
about 14. Since high fit values and
high jumps are both good, let’s try both 5 and 14.
- Go bck
to the constrained clustering window. Choose the target number of clusters
as 5, and again choose VarSet1. Under New cluster data name type:
constrained5 and under the New boundary name type Contrained_bound5. In
the next window choose to standardize by z score and name the “new data
set” constrained5data. Then click OK. This will take several minutes.
Choose to view the clusters in the existing map (should be “map 1”) when
prompted and screencapture. You should see color coded clusters plus
boundaries defining the clusters. Next, to help you interpret the
clusters, make only the boundaries “boundary1” and the quantitative
attributes layer visible so you can see how each cluster corresponds to
values for the three variables. Remember that you can change the color
ramp and the variable being mapped by double clicking on the constrained5_num
layer in the table of contents. Check out too the table showing the
variable averages for each class (keep in mind that the numbers you’ll see
are standardized z scores). Click Project>>table and then click on
lowergf Num Clusters (Cluster Statistics).
Using that, report how many polygons are in each cluster and try to
come up with a basic description of each cluster (e.g. high crime, high
income, low trees, etc.). However, note that one cluster has only one
member in it. Based on the statistics, why does it seem that this block
group was so different it needed its own class?
- Now
try the same thing using 14 classes (although don’t bother interpreting
the table), calling the outputs Constrained14 and constrained14bound. Take
a screencapture of the map.
- Now
try using only socio-economic variables: Med_hh_inc (median income),
P-Ownocc (percent owner occupied), P_occ (percent occupied structures),
P_SFDH (percent single famil detached home), MED_YR_ALL (median home
year), P_WH (percent white population) and TotCrim2005 (crime index).
Remember that you choose these by clicking “edit variable sets,” clicking
“create new set” (should be varset2) and then checking the variables you
want. Then Check Measure Goodness of Fit and choose from 3 to 15 for min
and max clusters. Give these
similar types of names as those you gave before, like constrained8 and
constrained8_bound. Again choose to standardize by z score. Take a
screencapture of the resulting graph. You should see a major spike at 8
clusters—this is a good sign, because what we’re looking for is both a
high number and a sudden increase in order to help us determine the best
number. Go ahead and then run a constrained clustering with that number of
clusters and the specified variable set, again standardizing with z-score.
Take a screencapture of the resulting map. Now compare it to PRIZM 62. You
should still have a layer called lower_gf_Cat at the botton of your table
of contents on the left side (double click on it and under “use single
categorical variable” make sure that DESC62 is selected). Check that layer
to make it visible and make everything unchecked above it except
contrained8_bound. Based on this you should be able to compare PRIZM to
the 8 class clustering by looking at the boundary outlines only from your
8 class clustering overlaid on your PRIZM62 solid fill map. Note how many
similarities there are, although they are still different. Take a
screencapture and caption it. It should look something like this

- Try a
quick Wombling boundary detection algorithm. Go to Data>>Detect
Boundary>>Wombling. Choose lowerbgNum(or whatever your numeric data
set is called from the first step). Choose varset1(from step 3) as your
variable set and check to standardize the data. Under the Thresholds tab
make sure that boundary type is “crisp” and you define boundaries based on
top 20% of likelihood values. Then click OK. In the next window just check
“z-score” and change the name of the new data set to Wombledata. Next you
should see the histogram with a big vertical line. Describe what this
histogram and vertical line are telling you. Then, click “yes” in the
“view boundary in map” interface. You should see two new layers:
womblingbound:BLV and womblingbound. The former gives the boundary
likelihood values for each boundary and the latter just gives in a single
color those that are over the define threshold. Click womblingbound and
view those boundaries overlaid on your 14 class cluster map from step 5.
Take a screencapture and describe what you see. How well do these wombled
boundaries describe the mapped cluster boundaries? How many wombled
boundaries are internal—that is located in an area that is not a boundary
between two cluster classes? (ie. Same class membership on either side of
the boundary?). Now try overlaying on the 5 class cluster layer from step
4. Take another screencapture and say how many “internal” boundaries there
are now?
- Finally,
we’ll try a fuzzy classification. Go to Data>>detect
boundary>>fuzzy classification. Choose 5 classes (for comparison
with step 4). Choose varset1 as your variable data set. Change the class
data name to fuzzyclass and the boundary name to fuzzybound. Under
“method” tab, uncheck the “detect boundaries using”. Then click OK and
choose z-score standardization. Change the new data set name to fuzzydata.
Click OK. View the output in the map. What you are seeing is a map showing
the likelihood that a given polygon actually belongs to what it calls
class 1. However, to make things a
little confusing, it’s class 1 is different from what was called class 1
in your step 4 clustering. Hence, if you toggle between the 5-class map
and this new map, you’ll see that those polygons with high class
likelihoods for class one actually correspond to class 3 in the old map.
To look at membership likelihood in another class, just double click on
the layer in the table of contents and under “use signle numeric variable,
change the variable to class 2, like this:
.
Screencapture and describe this map, also noting which of your original
clusters from the 5-class map that class 2 seems to correspond with.
- Assemble
your text and images and make into a PDF