I. History of Prior Research
Overview
The University of Vermont continues to develop projects that explore
the capture, access, navigation and use of digital facsimiles created from
primary source materials. The present project builds on this prior research,
as well as that of the Making of America project, the Colorado Digitization
project, the many projects of the Joint Information Systems Committee of
Great Britain, and others. A description of the more important prior research
at UVM follows.
A. The George Perkins Marsh Research Center
United States minister to the new kingdom of Italy in 1860, lawyer,
business man, scholar, language expert, and author of Man and Nature,
George Perkins Marsh was one of the first to recognize and describe
in detail the significance of human action in transforming the natural
world. The center provides transcriptions, annotations and images of 650
selected letters in Marsh's correspondence, as well as explanatory essays.
A generous grant from the Woodstock Foundation has supported this work.
B. Inventories of the University of Vermont's Manuscript Collections
A software grant from the Enigma Corporation, formerly Inso, allowed
UVM to establish its first two SGML-based text encoding projects. These
inventories, or Finding Aids, are EAD-encoded texts that allow for global
searching of UVM's
C. The Eugenics Collection (available soon at http://www.uvm.edu/~eugenics)
Funded by a grant from The Web Project of the Vermont Institute for
Science, Math and Technology, The Eugenics Collection brings together over
150 documents related to the eugenics movement in Vermont. Selected from
UVM's Special Collections, and a variety of state repositories, the documents
D. Fleming Museum Collection Catalog
E. Experimental Electronic Text Collections (History Review, Godey,
Alice B. Neal Haven)
II. Standards and Best Practices
To enhance interoperability with other digitization efforts, the project
will use established practices and standards where they exist.
The National Digital Library Federation, a program of the Council on
Library and Information Resources, has suggested three types of metadata
for digital surrogate collections: intellectual, structural, and administrative.
Intellectual metadata describes the content of each digital object for
purposes of cataloguing. Existing standards for intellectual metadata are
the library-based USMARC record and the Encoded Archival Description
(EAD) DTD for finding aids. Structural metadata is the information that
describes the internal organization of the digital object for purposes
of navigation. For example, in a digital facsimile of a book one might
wish to go to the next page, to the next chapter, or to the table of contents.
Administrative metadata records information about the digital object. This
includes technical specifications related to image capture, enhancements
to the digital object, information related to copyright and intellectual
property rights, and information that should remain with the object to
ensure its long term retention and use. Standards for structural and administrative
metadata being developed by the Making of America project will be used
for this project.
Although there is no one standard for capture and storage of digital surrogates, a number of best practices are being developed that balance long-term preservation needs with current technical limitations. At a minimum, this project will follow the Technical Recommendations for Digital Imaging Projects from the Image Quality Working Group of ArchivesCom, a joint Libraries/AcIS committee (http://www.columbia.edu/acis/dl/imagespec.html). These recommendations call for capturing bitonal images (printed text, line drawings) a 1-bit, 600 effective dpi to be store as uncompressed TIFF files, 8-bit greyscale for black and white photos, and 24-bit color, 300 effective dpi, for color images. The capture process will depend on the location and nature of the original. A combination of digital SLR cameras, digital video camers, and flat-bed scanners will be used.
Note: specify that we will follow AMICO (see Colorado for link)
specifications for images.
The Burlington Agenda: Research Issues in Intellectual Access to Electronically Published Historical Documents, a report on a meeting funded by the University of Vermont and the National Historical Publications and Records Commission (NHPRC), points out the limitations of today's search engines in providing intellectual access to the contents of electronically published historical documents. In an effort to address these limitations, this project will also rely on the standards developed by the Text Encoding Initiative (TEI) and Model Editions Partnership (MEP) to encode documents.
Within the context of the project itself, procedural standards will
be developed for digital capture, encoding, and cataloguing to ensure that
participants within UVM and partners from other institutions can create
interoperable collections.
III. Technology Infrastructure
Creation of a secure, useable, and reliable digital surrogate archive
depends on robust technology that can
- zoo cluster
- storage space
- reliability (mirror, Raid5, redundant)
- speed
- archiving (CDRW, tape)
- life cycle management (simple data refresh and data
migration)
software:
- continue funding maintenance of DynaText/DynaWeb for diaply/navigation
- open source tools for integration
- low-cost commercial products for XML/SGML tagging
- commercial products for image manipulation and OCR
IV. Training
- capture - training to standards
- cataloguing - library standards
- encoding - XML/SGML tools, TEI/MEP DTDs
- K-16 use of collection: tutorials, suggestions
V. Methodology
VI. Evaluation (see MOA)
- who will do it?
we need to:
- evaluate the appropriateness of the collection as well as the
structural and administrative metadat - is it valuable to students, scholars,
interested others?
- evaluate the ease of use including navigation, display, searching.
"will attempt to measure not only if the MoAII Testbed
architecture improved search capabilities and navigation
options over their print counterparts, but whether they encourage new
ways of searching and understanding the materials
represented."
- technical evaluation of the architecture. performance issues,
scalability, connectivity of Z39.50/MARC/SGML
- scalability of training: can VT be taught to grow the archive
work in (from MOA):
"For the proposed project, the IMG will employ classroom experiments,
online questionnaires, and various qualitative methods to
assess the feasibility and utility of network-based
finding aids. The evaluation plan will call for data collection, interpretation,
and
reporting at three intervals: prior to implementation;
at mid-project; and just before project closure. IMG's participation will
include
evaluation planning and administration, development
of qualitative and quantitative measures, data gathering, analysis, and
reporting."
Links to things mentioned in this document:
- MOA II: planning
docs, tech docs, etc.
- Colorado: http://coloradodigital.coalliance.org/projplan.html
- ARL DIGITAL INITIATIVES DATABASE
- MOA Cornell: http://moa.cit.cornell.edu/MOA/
- Columbia, assorted
projects
- Berkeley, Digital
Scriptorium
- NINCH, National Inititaive for
a Networked Cultural Heritage
- Berkeley, Digital
Images and Text link page: includes articles and papers, companies,
resources, links
- Columbia: Technical
Recommendations for Digital Imaging Projects
- JISC/HEDS
- TEI
- EAD
- MEP
- UVM etext collections