Lab 6: Spatial Regression

NOTE: if you’re having trouble getting the right results (i.e. the linear regression has significant spatial autocorrelation in the residuals and the spatial regression does not) because of data issues, you can download the correct version of the data, including the H2Odens variable, at Data_2006\NR245\NR245_backup.mdb\BG_GF_LC_census2

 

  1. Making a neighbor matrix:Open S-Plus and use file>>import data>>from database to import your BG.GF.census2 layer. Now enable S-Plus spatial module (file>>load module>>spatial). You should now see a spatial menu item appear in the menu. Click on Spatial>>neighbors. Choose Nearest neighborhoods as the source, BG.GF.Census2 as the data set, X Centroid as variable 1 and Y Centroid as variable 2. Keep the number of neighbors to 3 and the metric as Euclidean. Under Save in type BGneighbor3. This will create the neighbor matrix used to spatially weight observations in the spatial regression

 

  1. Linear regression: Go to statistics>>regression>>linear. Choose BG.GF.Census2 as the dataset, and input the following model:

P.coarse.veg~ H2Odens+P.HS.+MED.HH.INC+P.SFDH+P.Protland+log(d2down)+d2ramp. Under “result” check residuals and choose to save residuals in BG.GF.Census2. Hit apply and then in the table for BG.GF.Census2, right click on the heading newly created field “residuals” and hit properties. Change the Name to resid1. Copy and paste the regression results into your document.

 

  1. Spatial regression: Now go to spatial>>spatial regression. Choose BG.GF.Census2 as the data set, input the same model as above, choose SAR as the covariance type, choose BGneighbor3 as the covariance type and under the results tab check “residuals” and choose to save in BG.GF.Census2. Click OK. Copy and paste the regression results (at the beginning of the output, not including the variance-covariance matrix of coefficients or correlation matrix). Note the differences, if any, between the coefficients of this version and the regular, non-spatial regression from step 3. Open the BG.GF.Census2 table and right click on the new residuals field heading and click properties. Change the name to resid2.

 

  1. Moran test: Now you’ll see why we did this. Go to spatial>>spatial correlations, choose BG.GF>Census2 as the data set, resid1 as the variable, BGneighbor3 as the neighborhood matrix and moran as the statistic. Click apply. Report the Moran statistic and P value. Now do the same thing but for the variable resid2. Again report the Moran statistic and P value. Are they different? What does this tell you about spatial regression?

 

  1. View in Arc Map: Export the two residual columns plus the BKG_KEY column from Splus using file>>export data>>to database. Choose MS Access database as the To Data Target, BG.GF.Census2 as the From Data frame, type residsp as the Table Name, and under the filter tab, shift click to select on BKG.KEY, resid1, and resid2. Click OK. Now this should be a new table in your geodatabase. In Arc Map, load up any block group layer, such as BG.GF.Census2 and the new table, and do a tabular join to join that table to the block group layer. Go to the symbology window and choose quantities>>graduated color, with resid1 as the value. Hit classify and choose standard deviation as the method. Click OK. Back in the symbology window choose any color ramp then click OK and screencpature and caption. Do the same for resid2. 

 

  1. Second example: Now let’s try this with a new layer. On the share drive go to Data_2006\Database\Analysis\Analysis.mdb and copy and paste sample_props feature class to your nr245 geodatabase. If you view it in Arc Map you’ll notice it’s a point layer of properties with a bunch of variables. Load that table into S Plus using file>>import data>>from database. Now run the following model as a regular regression (statistics>>regression>>linear):

price~NFMIMPVL+ACRES+SQFTSTRC+YEAROLD+TREES.PER+DWTWN.DIST+INSTE.DIST

Under the Results tab, make sure to save residuals in your sample.props table. After doing this, right click on the residuals column heading and change the name to resid1. Copy and paste the coefficient table and briefly note which variables are significant or not significant. Now, create a neighbor matrix for this layer. Go to spatial>>spatial neighbors,  choose sample.props as the data set, X as variable 1, Y as variable 2, 5 as the number of neighbors (we’re choosing more because housing points are closer together than block groups) and choose to save the matrix as propneighbor1. Now do a spatial regression (spatial>>spatial regression), with the model just given as the formula, propneighbor1 as the neighbor object, SAR as the covariance type, and under the results tab choose to save the residuals in sample.props. Copy and paste the coefficient results and describe what is now significant and how that differs from the previous result. You should see one change in significance of a coefficient. Explain why that might make sense in this case. Rename the column heading for the new residuals to resid2. Now, using the instructions from earlier, in SPlus run a Moran’s I analysis for both and report which or both are autocorrelated. Does this result make sense?

 

  1. Display in Arc Map: Export using “Export to database” (this may require refreshing the interface, by rechoosing MS Access Database as a Data Target, and sample.props as the data frame), choosing only resid1, resid2, and PROPID as columns. Load that table in ArcGIS, join it to samp.props. Before plotting, get rid of the outline. Do this by right clicking in the classification window and hitting “properties for all symbols”.  In the resulting window, double click on the point shown under “preview” and the in subsequent window uncheck “use outline.” Click OK twice. Then, back in the symbology window (where all points should be the same color) hit the classification button and choose the standard deviation classification method. Use whatever color ramp you want. Take a screencapture and caption. Do this again for resid2.  Interpret what these maps are showing. What do negative or positive values mean? Which appears to have more clusteringof high and low values in space? Which appears more randomly distributed. What does that mean from a statistical perspective if residuals of similar value tend to be near each other?

 

  1. Assemble materials in document and upload.