Aerial Videography & Land-Cover Mapping

Personnel: David Williams, Tracy Onega, Ernie Buford, David Capen, Chris Boget

As part of New England's Gap Analysis program, new methods for integrating aerial-videography with the process of land use classification are being developed here at the Spatial Analysis Lab (SAL), located in the Aiken Natural Resource Center at the University of Vermont. Here is a brief overview taken from Joel Schlagel's report.

Aerial videographic imagery, with frame location identified using global positioning system receivers, has been demonstrated to be an efficient and cost effective method for gathering ground-truth data for satellite image interpretation and post-classification accuracy assessment (Graham, 1993; Slaymaker, et. al., 1995).

Both the usefulness and effectiveness of air-video image interpretation can be enhanced by integrating video imagery with other spatial databases in either ERDAS or Arc/Info. The air-video interpretation system is being used for the development of refined land-cover maps for Vermont in support of vegetation mapping for the Gap Analysis program of the National Biological Survey. The system is also being used for an investigation of the effect of training sample density on the accuracy of image classification. A number of other applications are being developed.

The use of air-video in training site selection allows for a different approach to traditional field visits. Rather than seeking a minimally sufficient number of points, the air-video approach described by Graham (1993) seeks to develop extremely large training site data-sets. The collection and analysis of a large number of training sites can result in a significant reduction in the number of misclassified pixels under a variety of image processing techniques. Graham (1993) and Slaymaker, et. al. (1995) have reported very high accuracies in their land-cover mapping efforts, despite working in completely different terrain, and using very different image processing methods. The common element was a training site data-set an order of magnitude larger than commonly used. Graham reported interpreting more than 11,000 sample points in Arizona, or about 1,000 points per TM scene, identifying 142 distinct vegetation classes. Slaymaker interpreted 18,000 points at 2,300 sample locations for a single TM scene in Massachusetts, identifying 42 vegetation classes, with an overall accuracy of 89% for 11 vegetation types at Anderson Level 3.

Building on the approaches of Graham and Slaymaker, the Vermont Cooperative Fish and Wildlife Unit will implement air-video interpretation for land-cover mapping in Vermont as part of the National Biological Service Gap Analysis Project (Scott, 1993). To facilitate and enhance the usefulness of air video, an interpretation station that integrates video, image, GIS data has been developed.

A more recent study was conducted by Eric L. Lambert (former SAL staff):

A Pilot Study: Franklin County, Vermont

In order to better develop the methods by which aerial-videography and image processing can be integrated for the purpose of developing very large sets of training samples, a pilot study was carried out. This study was done using a subset of a full Landsat TM scene from northwest Vermont, dated October 6, 1992. The area covered by this subseted scene is approximately 260,000 Hectares (1 Hectare = 10,000 square meters). In addition, ancilliary data such as stream networks, major roadways and town and county boundaries were clipped to conform with this study area.

Within Imagine (Version 8.2), an unsupervised classification was first performed using all 6 bands (1-5 and 7) of the TM subset. The result of this classification was a thematic map of 50 classes. This map served as a starting point in our initial attempts at defining a set of spectrally unique class signatures for the approximately 10-15 unique landuse and forest types.

GPS data from two flight paths (June 4, 1994 & May 15, 1995) were differentially corrected using data recorded at the GPS base station located in Aiken Center (UVM). These two series of points were then overlaid on the TM scene (Figures 1 and 2). We currently have air video taken during three primary time periods:

Figure 1. ERDAS Imagine viewer showing a section of the October, 1992 TM scene (bands 3,2,1) with GPS points from the June 4, 1994 (red) and May 15, 1995 (blue) overlaid. The white "inquire cursor box" is showing the area magnified in Figure 2.

Click on image for full view (265K)
Figure 2. ERDAS Imagine viewer depicting a magnified section of the Landsat TM scene with GPS points from the June 4, 1994 (red) overlaid. Note the single highlighted point which corresponds to the selected record shown in Figure 4.

Click on image for full view (251K)
Identifying the land use types which were associated with each of the 50 class signatures from the unsupervised (isodata) classification involved several steps. First, both the wide angle and zoomed videos were viewed on two monitors. The videos were then stopped at frames where the landuse within the zoomed frame was considered homogeneous. This is related to the fact that the area of the zoomed video frame is approximately equal to the pixel size within the TM image (30 meters). The time code reference for the video frame was then read off the video monitor (Figure 3).

Using the time code from the video frame of interest, the vector attribute file associated with the series of GPS points taken during the videography flight was searched to find the matching point (Figure 4).

Figure 3. A sample wide-angle video frame showing time code reference.

Click on image for full view (171K)

Figure 4. A part of the ERDAS Imagine table which lists the point attributes from the Arc/Info file associated with the series of GPS points from the June 4th flight. Note the single record highlighted in yellow. This record corresponds to the point which is highlighted in Figures 1 and 2.

Click on image for full view (17K)

When the matching point is located and selected in the attribute table, the GPS point is similarly highlighted on the TM scene (see Figures 1 & 2). Using another Imagine viewer containing an image of the isodata thematic map, the isodata class number which underlies the selected GPS point is determined. This isodata class is then labeled with an appropriate land use type. In addition, more specific notes are taken which describe, in more detail, the species composition for a forested frame.

This process is then iterated many (hundreds to thousands) times in order to accumulate an appropriate number of samples for each unique land use type and isodata class.


While this process is still ongoing, the initial results are very pleasing. It is quite possible to "visit" almost 100 sites in a day using these integrated videography techniques. The number of training signatures which can then be applied to perform any type of supervised classification is significantly (10-20 times) larger than could be anticipated if one was using traditional groundtruthing methods. As such, the accuracy of the final landuse classification can be vastly improved while using substantially less time (and money) than would be needed to acquire the same number of training sites without the use of the integrated videography.


Graham, L.E. 1993. Airborne Video for Near Real-Time Resource Applications. Journal of Forestry. 91(August):28-32

Slaymaker, D.M, K.M.L. Jones, C.R. Griffin, and J.T. Finn, 1995. Mapping Deciduous Forests in New England Using Aerial Videography and Multi-Temporal Landsat TM Imagery. In Review.

       Updated: 30 June 2000