Lab 9. Geographic Cluster Analysis

 

  1. This week we’ll do spatial cluster analysis. First, go to Data_2006\Database\Analysis\Analysis.mdb and export lowerGF_BG to a new shapefile located in your temp directory. At the end of this you’ll transfer this file back to your zoo account. Note, throughout this lab, if you have any trouble understanding the methods, check out the help for the software, or go to http://www.terraseer.com/help/boundaryseer/
  2. Open Boundary Seer. Create a new project when it prompts you. Click Vector, then browse to your newly exported shapefile from the last step and in the next window choose “planar/projection” and click next. In the following window, choose to import P_pavement and P-coarseveg as numeric variables (shift click and hit the “numeric>>” button), import DESC_5, DESC_15 and DESC_62 as categorical and everything else below DESC62 as numeric. Click next and then choose to put the map in a New Map.  You can view numeric attributes by double clicking on the TOC in the left on lowerbgNum (for numeric variables) and then under “use single numeric variable” changing the variable to view. You can do the same for categorical (PRIZM) variables by clicking on “lowerbgCat.” Start by unchecking the categorical variable and then changing the numeric variable being viewed to P_coarse_veg. Create new color scheme as well within that window (like dark to light green). Take a screencapture of the resulting map. Before going on, save the project (file>>save project).
  3. Now let’s try a simple clustering that differentiates neighborhoods based on crime, income and vegetation. Click data>>detect boundary>>constrained clustering. Before choosing the number of clusters, we’ll do an optimization to help us choose.  Choose your numeric data set as the Data (probably called lowergfNum) Check “measure goodness of fit for multiple partitions” and then choose a minimum of 3 and maximum of 15.  Then hit Edit Variable Set and hit “create new set.” Under the “in set” column check MED_HH_INC, p_coarseveg and Robb2005. Then click close and yes to the box asking if you want to save those changes. Make sure your “detect boundary using” is set to “variable set: varset1.” Click ok. Choose to standardize using Z scores. Click OK. Now look at the graph and screencapture. You should note that the highest fit value is 5 classes, but the number where the highest jump occurs is about 14.  Since high fit values and high jumps are both good, let’s try both 5 and 14.
  4. Go bck to the constrained clustering window. Choose the target number of clusters as 5, and again choose VarSet1. Under New cluster data name type: constrained5 and under the New boundary name type Contrained_bound5. In the next window choose to standardize by z score and name the “new data set” constrained5data. Then click OK. This will take several minutes. Choose to view the clusters in the existing map (should be “map 1”) when prompted and screencapture. You should see color coded clusters plus boundaries defining the clusters. Next, to help you interpret the clusters, make only the boundaries “boundary1” and the quantitative attributes layer visible so you can see how each cluster corresponds to values for the three variables. Remember that you can change the color ramp and the variable being mapped by double clicking on the constrained5_num layer in the table of contents. Check out too the table showing the variable averages for each class (keep in mind that the numbers you’ll see are standardized z scores). Click Project>>table and then click on lowergf Num Clusters (Cluster Statistics).  Using that, report how many polygons are in each cluster and try to come up with a basic description of each cluster (e.g. high crime, high income, low trees, etc.). However, note that one cluster has only one member in it. Based on the statistics, why does it seem that this block group was so different it needed its own class?
  5. Now try the same thing using 14 classes (although don’t bother interpreting the table), calling the outputs Constrained14 and constrained14bound. Take a screencapture of the map.
  6. Now try using only socio-economic variables: Med_hh_inc (median income), P-Ownocc (percent owner occupied), P_occ (percent occupied structures), P_SFDH (percent single famil detached home), MED_YR_ALL (median home year), P_WH (percent white population) and TotCrim2005 (crime index). Remember that you choose these by clicking “edit variable sets,” clicking “create new set” (should be varset2) and then checking the variables you want. Then Check Measure Goodness of Fit and choose from 3 to 15 for min and max clusters.  Give these similar types of names as those you gave before, like constrained8 and constrained8_bound. Again choose to standardize by z score. Take a screencapture of the resulting graph. You should see a major spike at 8 clusters—this is a good sign, because what we’re looking for is both a high number and a sudden increase in order to help us determine the best number. Go ahead and then run a constrained clustering with that number of clusters and the specified variable set, again standardizing with z-score. Take a screencapture of the resulting map. Now compare it to PRIZM 62. You should still have a layer called lower_gf_Cat at the botton of your table of contents on the left side (double click on it and under “use single categorical variable” make sure that DESC62 is selected). Check that layer to make it visible and make everything unchecked above it except contrained8_bound. Based on this you should be able to compare PRIZM to the 8 class clustering by looking at the boundary outlines only from your 8 class clustering overlaid on your PRIZM62 solid fill map. Note how many similarities there are, although they are still different. Take a screencapture and caption it. It should look something like this
  7. Try a quick Wombling boundary detection algorithm. Go to Data>>Detect Boundary>>Wombling. Choose lowerbgNum(or whatever your numeric data set is called from the first step). Choose varset1(from step 3) as your variable set and check to standardize the data. Under the Thresholds tab make sure that boundary type is “crisp” and you define boundaries based on top 20% of likelihood values. Then click OK. In the next window just check “z-score” and change the name of the new data set to Wombledata. Next you should see the histogram with a big vertical line. Describe what this histogram and vertical line are telling you. Then, click “yes” in the “view boundary in map” interface. You should see two new layers: womblingbound:BLV and womblingbound. The former gives the boundary likelihood values for each boundary and the latter just gives in a single color those that are over the define threshold. Click womblingbound and view those boundaries overlaid on your 14 class cluster map from step 5. Take a screencapture and describe what you see. How well do these wombled boundaries describe the mapped cluster boundaries? How many wombled boundaries are internal—that is located in an area that is not a boundary between two cluster classes? (ie. Same class membership on either side of the boundary?). Now try overlaying on the 5 class cluster layer from step 4. Take another screencapture and say how many “internal” boundaries there are now?
  8. Finally, we’ll try a fuzzy classification. Go to Data>>detect boundary>>fuzzy classification. Choose 5 classes (for comparison with step 4). Choose varset1 as your variable data set. Change the class data name to fuzzyclass and the boundary name to fuzzybound. Under “method” tab, uncheck the “detect boundaries using”. Then click OK and choose z-score standardization. Change the new data set name to fuzzydata. Click OK. View the output in the map. What you are seeing is a map showing the likelihood that a given polygon actually belongs to what it calls class 1.  However, to make things a little confusing, it’s class 1 is different from what was called class 1 in your step 4 clustering. Hence, if you toggle between the 5-class map and this new map, you’ll see that those polygons with high class likelihoods for class one actually correspond to class 3 in the old map. To look at membership likelihood in another class, just double click on the layer in the table of contents and under “use signle numeric variable, change the variable to class 2, like this: . Screencapture and describe this map, also noting which of your original clusters from the 5-class map that class 2 seems to correspond with.
  9. Assemble your text and images and make into a PDF