**Lab 9. Spatial
clustering, PCA and spatial lag regression **

Due Wed April 11.

1. Download BG_BACI_BACO.shp, BG_BACI_BACO_Blank.shp, and BGBACIBACO.sav from the NR 245/lab9 directory online into C:\temp

2.
__Spatial Cluster analysis__**. **Open SAM 4.0 and go to file>>open>>open
shapefile and load BG_BACI_BACO.shp. You should see an interface called
"data settings." (If you don't, click on the fifth icon from the
left, which looks like a stack of papers). Go to the connectivity matrix tab.
Click "create/edit". Click the connectivity criterion tab and chose
"Gabriel criterion." Now click "Create" the
"close." Then click
Structure>>Cluster and Spatial Cluster.
In the interface, control-click to choose the following variables.
P_BLK, P_BACH,MED_AGE, MED_HH_INC, P_OWNOCC, P_SFDH, P_PROTLAND, and P_HH_RUR.
Choose 6 clusters, and make sure "spatially constrained is checked,"
with "Gabriel criterion" selected below it. Keep everything else the
default. Click Calculate. This might
take a while. Go have a cup of coffee…or you can start on the next thing while
this is running. **Q1. Present the group
size of each cluster. **Don’t click
on the graphical results—it will probably crash. Instead, hit the X on the cluster interface
to close it. It will then ask you if you want to save the 1 unsaved variable.
Click Yes. Call the field Cluster. Then
click on the data save as/exportation button (fourth icon from the left on the
main menu). Uncheck everything but joinID and Cluster. Save the output as DBF file format. Click the
little disk icon and browse to your lab 9 directory to save it there and call
it clusters. Then click Export. Now open
ArcMap and join the B_BACI_BACO_blank layer to this table using JoinID. **Now plot out and scerencpature the Cluster
field using unique values symbology and a high contrast color scheme.**

**3.
**__Principal Components/Factor Analysis__.
Open SPSS/ PASW statistics. When it prompts to open a file at the beginning,
choose BGBACIBACO.sav. Click
Analyze>>Dimension reduction>>factor. Now let's choose variables
that will be dimensionally reduced into an index that relates to socio-economic
population characteristics. Shift-click the following, then click the right
arrow to add to the variables window: MED_HH_INC, P_OWNOCC, P_VAC, P_SFDH,
MED_VAL_AL, P_BLK, PBACH, P_transit, PEMP, P_HH_RUR. Then click on "Descriptives" and
check "Coefficients" and "Significance levels" then
"continue." Next click on "extraction" and under
"based on eigenvalue" choose greater than 0.6, then
"continue." Then choose "scores" and "Save as
variables" and "continue." Then click "OK". Now
interpret what you see. **Q2. Look at the
correlation matrix. What two variables have the highest correlation (excluding
the 1’s)? Look at the Communalities table. Which variable has the highest
proportion of its variance explained by these principal components? Then look
at the Total Variance Explained table. How many principal components are above
an Eigenvalue of .6? Cut and paste that table and report what percentage of the
cumulative variance is explained by those components that are above .6. Finally, look at the component matrix and
describe one variable that has a strong negative influence on component 1 and
one that has a strong positive influence. **Now look at your data matrix. You
should see several new columns. Those are your principal components. Now, let’s
export this. Go to file>>save as
and in the “save as type” choose comma delimited (csv). Then hit save. Now,
let’s quickly map those out. Open Arc
Map and load BG_BACI_BACO_blank and join this new table use joinID. **Plot out and screencapture Factor 1 using
graduated color symbology. **Think about what this map is telling you in
light of the factor loadings, although you don’t need to report it. Now export
this map from Arc map (right click in table of
contents>>data>>Export data)and save the output as a shapefile
called PCA.shp in your lab 9 directory.

4.
__Spatial Lag Regression__**. **Now open the new version of
OpenGeoDa, which you will download from the NR245/lab9 directory. Save it in
C:\temp or an external drive. Double click it and it should run. Go to
file>>open shapefile. Browse to PCA and open it. Try plotting out a
variable on the map. Go to Map>>standard deviation and then choose
lncrime and OK. (no need to screencapture). Now go to
Tools>>weights>>create.
Choose the Weights File ID variable as ObjectID. Then, click on Rook
Contiguity and keep the order at 1. Click “Create” and name the matrix “PCArook.” Next, click
Methods>>regress. Check the Moran’s I value box. Click OK. Now let’s run
the regression. We’ll start with a linear regression. Choose lncrime (log of crime) as the
depedent. Choose your four PCs plus TC_E_P (tree percentage), P_Agpr (percent
agriculture), and P_protland as independent variables. Choose the type of regression as “classic”. Check
“weights file” and choose PCArook. Then click “run”. Look at the output . When
it’s done, click “Save to table” and choose the residuals to save to your
table. Then click “results” and look at the output. First, take a look at some
of the regression diagnostics. **Q3.
What’s the R-squared? Are all variables significant level? Report the
Multi-collinearity condition number. What does this tell us about our model
based on what we learned in class? Why would we expect this result given our
use of PCs? Next, answer some questions
about the spatial diagnostics for the model.
Report the Moran’s I test on the error (residuals), what’s its
significance and what does this tell us? What about the Lagrange multiplier
tests on lag and error? What does the
“Diagnostics for Spatial Dependence for Weight Matrix” section tell you about
whether the spatial lag or spatial error model is likely to be better and
why? **

Next we’ll run a spatial error regression. Keep
everything the same including having the checked weights box. Then choose
Spatial error as the model and hit run.
It will calculate for a while. When it’s done,
click “Save to table” and choose the residuals. Now run the spatial lag model.
Again save to table the residuals. This
time, view the results. The new results window will give you all the results
for models you’ve run so far. **Q4. Report
the R-squared for the two spatial models. Which is highest? Furthermore, what do the Akaike Info
Criterion and Log Likelihoods from the models tell you about which model is
probably the best? Is this consistent
with the message given by the diagnostics from the OLS? Next, report the
coefficients on the autocorrelation parameters for both the error and lag models
and if they are significant **(note that the error model autocorrelation
parameter appears in the variable/coffiecients table as Lambda and the lag parameter
appears in two places: as Lag coeff/Rho under the summary output and as
W_lncrime in the top of the variables
table)**. Report what happened to the tree
variable TC_E_P between the three models. Why do you think that difference
might be? **

**Finally,
Plot out a LISA of the three residuals you saved by clicking
space>>univariate LISA. Do one for each residual, choosing to plot out
only the cluster map. Take a screen capture of each. In what way do these maps
show that the two spatial models are a clear improvement on the original OLS
model? **

5. Package everything up and upload.