Lab 2
Summarizing land
cover and LiDAR surfaces by parcel at a fine scale
Due Feb 4.
- Again
map a drive to \\zoofiles\gradgis.
Open Arc Catalog and in it copy the entire Randalstown directory from
within Data_2006\Database. This lab will involve large datasets so I
recommend that you copy the data to the temp drive and do your work from
there. At the end of the lab, copy the entire directory from the temp
directory back to your Zoo account.
- First
we will summarize building heights by building, just to get an idea of how
LiDAR data works. The intuitive thing to do is subtract the lidar_bare_r
layer from the lidar_first_r layer in raster calculator. However, there
are tiny errors in the data that will result in a bunch of artificial
“pits” or pixels with values less than zero. To deal with this we’ll use
the raster “conditional” function to essentially set any resulting pixels
that would have been less than zero to zero. Con uses the following
syntax: CON(if statement, then do statement, else
do statement). When using the raster calculator, remember not to type
things in but rather use the buttons. Your CON function should like
something like this:

- The
result will be a temporary calculation layer. Make that permanent by right
clicking on it in the TOC and clicking “make permanent.” Save it in your
Randalstown folder and call it lidar_diff_r. This layer gives us the
height of landscape features relative to the base land height. We’ll use
it to estimate building height. Load up the buildings_R layer from the
Randalstown geodatabase and then use zonal statistics to summarize height
by building. However, just like with last lab, you’ll need to create a
join field, so open the attribute table, create a new field called “join1”
and use the field calculator to set it equal to ObjectID. Then click
spatial analyst toolbar>>zonal statistics and in the interface make
buildings_R as zone dataset, join1 as the zone field, and choose
lidar_diff_r as the value raster. Check the “join output table” check box
and uncheck the “chart statistic” box. Save the output in your Randalstown
folder and call it “bldg.” Open the attribute table for buildings_R and
create a new integer field called “height.” Use the field calculator to
set it equal to “bldg.MAX.” Now you can remove joins on that layer. Now
open Arc Scene and load up buildings_R, lc_r, and lidar_bare_r. Double
click on the lc_r item in the TOC and go to the base heights tab. Click on
the radio button that says “obtain heights for layer from surface and
choose “lidar_bare_r”. Then go to the rendering tab and check “shade areal
features” and then finally go to the symbology tab and click import and
choose the .lyr file we used before called gf_cemerge_1999_6cls. The click
OK. Now double click on buildings_r in the TOC and under “base heights”
click the radio button for “obtain heights for layer from:” choosing
lidar_bare_R (which it probably already has listed in the text box). Then
click the extrusion tab and check the “extrude features in layer check
box”. In the Extrusion value window below, click the calculator icon to
the right and click on “height.” It should now look like this.
. Under
rendering tab check the “shade areal features” box.” In symbology, choose
a color that contrasts with the background. Click OK and then take a
screencapture of what you see in 3D perspective. Note that lidar_bare_r is
never checked in the Table of Contents.
- Now we
will estimate the height of the tree canopy. First we’ll create a raster
layer giving the height of trees only. This time we’ll use a command call
SetNull that does something similar to CON, but specifically when you want
to return a null value for some of the pixels. In this case we want to set
all pixels without a Land cover code of 2 (trees) to null, so they don’t
appear at all. Open the calculator and write SetNull([LC_R] <> 2, [lidar_diff_r]). Note that
<> means “not equal to.” What this means is, if land cover is not
type 2 (trees), then the pixel value is set to null, otherwise to
lidar_diff. This will result in a temporary calculation layer that you can
make permanent by right clicking and clicking “make permanent. Call the
output Lidar_trees_r, remove the calculation and load up the newly named
layer. Make a hillshade of the Lidar_tree_r layer in spatial
analyst>>surface>>hillshade, and in the TOC, move the
hillshade right below the trees layer. Set the symbology of the
lidar_tree_r layer to green graduated color and transparency (under
display tab) to 25%. Now, the trees should like nice and textured. Display
this overlaid on parcels and buildings and screencapture the result.
- Now summarize
tree cover percentage and height by parcel. Let’s start by making a raster
1/0 layer showing tree pixels only. In raster calculator type [LC_R] == 2. You won’t need this any more after
this so you don’t have to make this calculation layer permanent. But you
will use it now for zonal statistics. Double click on the calculation
layer in the TOC and in the general tab change the name to “tree.” Go to
spatial analyst>>zonal statistics and choose parcels_R as the zone
dataset join1 as the zone field, tree as the value raster, check the “join
output table box”, uncheck the chart box, and save the output table in
your Randalstown folder as tree. Open the attribute table for the parcel
layer from the TOC then, create a new field set to Double called P_tree . Right click on the P_tree field heading and
click the field calculator. Set the new field equal to “tree.MEAN.” Then
right click and hit joins>>remove all joins. Next we’ll summarize
the tree height by parcel. Again use zonal statistics, only this time
choose “lidar_trees_r” as the input raster. Call the output table
treeht.dbf. Make two new fields (set to double), one called mean_ht and
one called max_ht. Use the field calculator and…you guessed it…calculate
the mean_ht field to be equal to treeht.MEAN and the max_ht field to be
equal to treeht.MAX. Again, remove all joins.Remove the temporary “trees”
1/0 layer.
- Now
let’s calculate some variables that we think might help explain variation
in tree height. First we’ll summarize distance to water, an obvious
predictor. Go ahead and load up Data_2006\Database\Hydrology\Hydrology.mdb\Surface_Water\NHD_Lines_GF
in your view (if it’s slow to load, copy it to your geodatabase). Also
load up the BG_R layer into your view. Now we’ll create a straight line distance
raster layer for the Randalstown area. First let’s set the boundaries for
the raster calculation. Go to Spatial Analyst>>options and click on
the extent tab and set the extent equal to BG_R. Next click Spatial Analyst>>Distance>>straight
line. Choose NHD_lines_GF as the distance to layer and choose an output
cell size of 3 meters. You can save the output as a temporary file. Then
do zonal statistics (spatial analyst>>zonal statistics) and set Parcels_R
as the zone dataset, ACCTID2 as the zone field, Distance to NHD lines (or
whatever you called your straight line distance raster layer) as the value
raster. Check “join output table to zone layer and uncheck chart
statistic. Call the output table wat.dbf. Click ok and then open the
attribute table for parcels_R. Add a new field (long integer) called
D2wat. Use the field calculator to set that new field equal to the “wat.Mean”
field. Then remove the joins for that layer (right click on the
layer>>joins>>remove). Now we’ll add another variable. We
imagine that properties that are located near empty lots (which tend to be
forested in this suburb—at least one of these empty lots is a large park) will
have bigger trees. Go to select by attribute for Parcels_R and type [bldg_type]
= '999' OR
[bldg_type] Is Null. This selects all parcels with no dwelling. Export
this to a new feature class in your geodatabase called no_dwel. No we want
to delete all these records from our Parcel_R layer. Without clearing the
selection, now go into edit mode (activate the Editor toolbar
and click Editor>start editing, making sure that Parcels_R is listed as
the target) and delete your selection by simply clicking the delete
button. If it doesn’t delete, then try redoing the attribute query in edit
mode and then hitting delete. Then click Editor>>stop editing and
save your edits. Get rid of the raster distance to water layer from your
view. Now let’s figure out how far each parcel is from an empty lot. Again
we’ll use straight line distance and zonal stats. This time use spatial
analyst to generate a layer giving straight line distance to the nearest empty
parcel with pixel size set to 3 meters. After generating this, then do
zonal statistics again, just like the last time, except your input raster
will be the layer you just created. Call the output table something like
nodwel.dbf. Again, create a new field in parcels_R, this time called
D2nodwel and set it to integer. Then, use the field calculator to set that
new field equal to nodwel.MEAN. Remove joins again.
- Now
close Arc Map and open SPlus. To import data from a Geodatabase go to
file>>import data>>from database. Under data source choose MS
Access Database and a browser window will come up. Browse to your
Randalstown geodatabase and then choose parcels_R under Table name. Keep
everything else as it is and click OK. You should now have a data table
with the attributes. Let’s add one more attribute: housing age (right now
year built is stored as a year). Open the table, scroll to the first empty
column and right click in the heading. Click insert column. Under the name
put Yearsold and under fill expression put 2005-YearBuilt. Now let’s run
some regressions. Go to statistics>>regression>>linear. Let’s
start with a univariate regression looking at the relationship between
house age and tree cover. Let’s regress percent
tree cover (P.tree) against years old. To do this click on P.Tree under
“dependent variable” and Yearsold under “independent.” You can manually
manipulate the formula where it says “formula.” Then, click
apply. Look at the results. What is the coefficient and P value on
Yearsold? What do these mean? What is the R squared value and what is that
a general measure of? Now, for comparison, try regression max.ht
(dependent) against Yearsold (independent). Which had a better model fit?
Finally, let’s do a multivariate regression. Let’s look at the relationship
between max tree height and the following independent variables: yrsold, d2nodwel, d2wat, shape.area and price.
Which of the variables are significant and which are not? How do you know
this? Drop the insignificant variable(s) and rerun the model. Now let’s
try independent variable transformations. We might hypothesize that the
effects of proximity to water and empty parcels may not be linear, but
rather diminish with increasing distance. This would entail a
log-transformation. Rewrite the model as max.ht~yrsold+log(d2nodwel)+log(d2wat)+Shape.Area.
Note what happens to the t-statistics and p-values for both of the logged
variables. Which one appears to improve in fit and which appears to
decrease in fit? Go ahead and rewrite the model so only the one term that
was improved by logging is still logged (that is, remove the “log” from
one of the terms—you can actually leave the parentheses). Before clicking apply go to the plot tab
and check “residuals vs. fit” and “residuals normal QQ.” Then click apply.
Copy and paste the results of this model and interpret what the
coefficients (in terms of sign and magnitude) and p values say about the
effect of each independent variable on the dependent. Does this result
make intuitive sense to you? Also report the R-squared and briefly say
what that indicates to you. Finally, copy and paste the two graphs (right
click>>copy on the graph sheets). Explain what these two graphs
suggest about how well this model fits the assumptions of normality and
constant variance in the regression model.
- Finish
up by combining your screencaptures and answers together in a document,
turn it into a PDF and upload.