Lab 2

Summarizing land cover and LiDAR surfaces by parcel at a fine scale

Due Feb 4.

 

 

  1. Again map a drive to \\zoofiles\gradgis. Open Arc Catalog and in it copy the entire Randalstown directory from within Data_2006\Database. This lab will involve large datasets so I recommend that you copy the data to the temp drive and do your work from there. At the end of the lab, copy the entire directory from the temp directory back to your Zoo account.
  2. First we will summarize building heights by building, just to get an idea of how LiDAR data works. The intuitive thing to do is subtract the lidar_bare_r layer from the lidar_first_r layer in raster calculator. However, there are tiny errors in the data that will result in a bunch of artificial “pits” or pixels with values less than zero. To deal with this we’ll use the raster “conditional” function to essentially set any resulting pixels that would have been less than zero to zero. Con uses the following syntax: CON(if statement, then do statement, else do statement). When using the raster calculator, remember not to type things in but rather use the buttons. Your CON function should like something like this:

 

  1. The result will be a temporary calculation layer. Make that permanent by right clicking on it in the TOC and clicking “make permanent.” Save it in your Randalstown folder and call it lidar_diff_r. This layer gives us the height of landscape features relative to the base land height. We’ll use it to estimate building height. Load up the buildings_R layer from the Randalstown geodatabase and then use zonal statistics to summarize height by building. However, just like with last lab, you’ll need to create a join field, so open the attribute table, create a new field called “join1” and use the field calculator to set it equal to ObjectID. Then click spatial analyst toolbar>>zonal statistics and in the interface make buildings_R as zone dataset, join1 as the zone field, and choose lidar_diff_r as the value raster. Check the “join output table” check box and uncheck the “chart statistic” box. Save the output in your Randalstown folder and call it “bldg.” Open the attribute table for buildings_R and create a new integer field called “height.” Use the field calculator to set it equal to “bldg.MAX.” Now you can remove joins on that layer. Now open Arc Scene and load up buildings_R, lc_r, and lidar_bare_r. Double click on the lc_r item in the TOC and go to the base heights tab. Click on the radio button that says “obtain heights for layer from surface and choose “lidar_bare_r”. Then go to the rendering tab and check “shade areal features” and then finally go to the symbology tab and click import and choose the .lyr file we used before called gf_cemerge_1999_6cls. The click OK. Now double click on buildings_r in the TOC and under “base heights” click the radio button for “obtain heights for layer from:” choosing lidar_bare_R (which it probably already has listed in the text box). Then click the extrusion tab and check the “extrude features in layer check box”. In the Extrusion value window below, click the calculator icon to the right and click on “height.” It should now look like this. . Under rendering tab check the “shade areal features” box.” In symbology, choose a color that contrasts with the background. Click OK and then take a screencapture of what you see in 3D perspective. Note that lidar_bare_r is never checked in the Table of Contents.
  2. Now we will estimate the height of the tree canopy. First we’ll create a raster layer giving the height of trees only. This time we’ll use a command call SetNull that does something similar to CON, but specifically when you want to return a null value for some of the pixels. In this case we want to set all pixels without a Land cover code of 2 (trees) to null, so they don’t appear at all. Open the calculator and write SetNull([LC_R]  <> 2, [lidar_diff_r]). Note that <> means “not equal to.” What this means is, if land cover is not type 2 (trees), then the pixel value is set to null, otherwise to lidar_diff. This will result in a temporary calculation layer that you can make permanent by right clicking and clicking “make permanent. Call the output Lidar_trees_r, remove the calculation and load up the newly named layer. Make a hillshade of the Lidar_tree_r layer in spatial analyst>>surface>>hillshade, and in the TOC, move the hillshade right below the trees layer. Set the symbology of the lidar_tree_r layer to green graduated color and transparency (under display tab) to 25%. Now, the trees should like nice and textured. Display this overlaid on parcels and buildings and screencapture the result. 
  3. Now summarize tree cover percentage and height by parcel. Let’s start by making a raster 1/0 layer showing tree pixels only. In raster calculator type [LC_R] == 2. You won’t need this any more after this so you don’t have to make this calculation layer permanent. But you will use it now for zonal statistics. Double click on the calculation layer in the TOC and in the general tab change the name to “tree.” Go to spatial analyst>>zonal statistics and choose parcels_R as the zone dataset join1 as the zone field, tree as the value raster, check the “join output table box”, uncheck the chart box, and save the output table in your Randalstown folder as tree. Open the attribute table for the parcel layer from the TOC then, create a new field set to Double called P_tree . Right click on the P_tree field heading and click the field calculator. Set the new field equal to “tree.MEAN.” Then right click and hit joins>>remove all joins. Next we’ll summarize the tree height by parcel. Again use zonal statistics, only this time choose “lidar_trees_r” as the input raster. Call the output table treeht.dbf. Make two new fields (set to double), one called mean_ht and one called max_ht. Use the field calculator and…you guessed it…calculate the mean_ht field to be equal to treeht.MEAN and the max_ht field to be equal to treeht.MAX. Again, remove all joins.Remove the temporary “trees” 1/0 layer.
  4. Now let’s calculate some variables that we think might help explain variation in tree height. First we’ll summarize distance to water, an obvious predictor. Go ahead and load up Data_2006\Database\Hydrology\Hydrology.mdb\Surface_Water\NHD_Lines_GF in your view (if it’s slow to load, copy it to your geodatabase). Also load up the BG_R layer into your view.  Now we’ll create a straight line distance raster layer for the Randalstown area. First let’s set the boundaries for the raster calculation. Go to Spatial Analyst>>options and click on the extent tab and set the extent equal to BG_R. Next click Spatial Analyst>>Distance>>straight line. Choose NHD_lines_GF as the distance to layer and choose an output cell size of 3 meters. You can save the output as a temporary file. Then do zonal statistics (spatial analyst>>zonal statistics) and set Parcels_R as the zone dataset, ACCTID2 as the zone field, Distance to NHD lines (or whatever you called your straight line distance raster layer) as the value raster. Check “join output table to zone layer and uncheck chart statistic. Call the output table wat.dbf. Click ok and then open the attribute table for parcels_R. Add a new field (long integer) called D2wat. Use the field calculator to set that new field equal to the “wat.Mean” field. Then remove the joins for that layer (right click on the layer>>joins>>remove). Now we’ll add another variable. We imagine that properties that are located near empty lots (which tend to be forested in this suburb—at least one of these empty lots is a large park) will have bigger trees. Go to select by attribute for Parcels_R and type [bldg_type] =  '999' OR [bldg_type] Is Null. This selects all parcels with no dwelling. Export this to a new feature class in your geodatabase called no_dwel. No we want to delete all these records from our Parcel_R layer. Without clearing the selection, now go into edit mode (activate the Editor toolbar and click Editor>start editing, making sure that Parcels_R is listed as the target) and delete your selection by simply clicking the delete button. If it doesn’t delete, then try redoing the attribute query in edit mode and then hitting delete. Then click Editor>>stop editing and save your edits. Get rid of the raster distance to water layer from your view. Now let’s figure out how far each parcel is from an empty lot. Again we’ll use straight line distance and zonal stats. This time use spatial analyst to generate a layer giving straight line distance to the nearest empty parcel with pixel size set to 3 meters. After generating this, then do zonal statistics again, just like the last time, except your input raster will be the layer you just created. Call the output table something like nodwel.dbf. Again, create a new field in parcels_R, this time called D2nodwel and set it to integer. Then, use the field calculator to set that new field equal to nodwel.MEAN. Remove joins again.
  5. Now close Arc Map and open SPlus. To import data from a Geodatabase go to file>>import data>>from database. Under data source choose MS Access Database and a browser window will come up. Browse to your Randalstown geodatabase and then choose parcels_R under Table name. Keep everything else as it is and click OK. You should now have a data table with the attributes. Let’s add one more attribute: housing age (right now year built is stored as a year). Open the table, scroll to the first empty column and right click in the heading. Click insert column. Under the name put Yearsold and under fill expression put 2005-YearBuilt. Now let’s run some regressions. Go to statistics>>regression>>linear. Let’s start with a univariate regression looking at the relationship between house age and tree cover. Let’s regress percent tree cover (P.tree) against years old. To do this click on P.Tree under “dependent variable” and Yearsold under “independent.” You can manually manipulate the formula where it says “formula.” Then, click apply. Look at the results. What is the coefficient and P value on Yearsold? What do these mean? What is the R squared value and what is that a general measure of? Now, for comparison, try regression max.ht (dependent) against Yearsold (independent). Which had a better model fit? Finally, let’s do a multivariate regression. Let’s look at the relationship between max tree height and the following independent variables:  yrsold, d2nodwel, d2wat, shape.area and price. Which of the variables are significant and which are not? How do you know this? Drop the insignificant variable(s) and rerun the model. Now let’s try independent variable transformations. We might hypothesize that the effects of proximity to water and empty parcels may not be linear, but rather diminish with increasing distance. This would entail a log-transformation. Rewrite the model as max.ht~yrsold+log(d2nodwel)+log(d2wat)+Shape.Area. Note what happens to the t-statistics and p-values for both of the logged variables. Which one appears to improve in fit and which appears to decrease in fit? Go ahead and rewrite the model so only the one term that was improved by logging is still logged (that is, remove the “log” from one of the terms—you can actually leave the parentheses).  Before clicking apply go to the plot tab and check “residuals vs. fit” and “residuals normal QQ.” Then click apply. Copy and paste the results of this model and interpret what the coefficients (in terms of sign and magnitude) and p values say about the effect of each independent variable on the dependent. Does this result make intuitive sense to you? Also report the R-squared and briefly say what that indicates to you. Finally, copy and paste the two graphs (right click>>copy on the graph sheets). Explain what these two graphs suggest about how well this model fits the assumptions of normality and constant variance in the regression model.
  6. Finish up by combining your screencaptures and answers together in a document, turn it into a PDF and upload.