TYPE OF PROPOSAL: poster or project demonstration.

TITLE: We Built it. Where are They?: Text Encoding and the Humanities Scholar

AUTHOR: Hope Greenberg

AFFILIATION: University of Vermont

E-MAIL: hope.greenberg@uvm.edu

In The Diffusion of Innovations, Everett Rogers characterizes the adopters of new technologies as innovators, early adopters, early and late majority, and laggards. Building on this idea, and on Moore's conclusion that a chasm exists between the first two groups and subsequent groups, Geoghegan suggests that early adopters have a high comfort level with technology, value innovation, and are willing to experiment. The majority, however, values minimal risk, requires clear and compelling pedagogical value before attempting a new technology, does not wish current practices to be constrained, and eschews technology for technologies' sake.

Since its first 'official' appearance in 1994, the Text Encoding Initiatives Guidelines for Electronic Text Encoding and Interchange has had a profound impact on the innovators and early adopters of the electronic text world. Large text collections and scholarly projects already versed in SGML, like the University of Virginia's Electronic Text Center and The Rosetti Archive, and other projects at the Institute for Advanced Technology in the Humanities, have adopted it. Others, like the Model Editions Partnership and the Making of America project, are exploring ways to apply it to historical documents. The MLA recommends the TEI as the encoding method of choice in its Guidelines for Electronic Scholarly Editions. Text collections that focus on a particular topic or era, such as the Brown Women Writers, Victorian Women Writers, and the ORLANDO projects, as well as several thesaurus projects, have found that it supports building a collection that might not otherwise be possible in the print world.

A quick perusal of the list of projects using the TEI, as collected at their web site, shows that many are large scale projects hosted by libraries or consortia, and specifically funded. Given the technical difficulties associated with creating these documents, it is not surprising that early adopters tend to be those with resources to devote exclusively to these projects. Nor is it surprising that several groups are attempting to provide materials to assist scholars and students who are interested in undertaking projects of this type (cf. University of Virginia Electronic Text Center, Brown University Scholarly Technologies Group, etc.), providing information on their encoding practices (cf. British Women Romantic Poets Project, Victorian Women Writers Project, etc.), or formulating best practices for creating digital resources in general (cf. Arts and Humanities Data Service Guides).

Individual scholars or initially small-scale projects are not entirely absent. The fine work begun by Dr. Stuart Lee as a web-based tutorial on the poetry of Isaac Rosenberg and growing into the Virtual Seminars for Teaching Literature, as well as projects like The Walt Whitman Archive and others supported by IATH, all point to the pedagogical benefits to be obtained from individual scholars creating these texts.

But can the chasm between the TEI early adopters and the majority of scholars, be bridged? Is there a compelling pedagogical benefit to the individual humanities scholar in creating TEI-encoded electronic texts, or is this a job best left to large projects, consortia, publishers, and libraries? Should individual scholars be creators or simply consumers?

The Godey's Lady's Book site at the University of Vermont attempts to explore these questions. Before compelling benefits can be derived from the creation of individual projects, we must determine if the creation of such projects is feasible given limited time and resources. That is, can the creation of electronic texts fit within the requirements for adoption by the majority as described by Geoghagen?

Drawing on the models available, this project is creating lightly encoded texts from a popular mid-19th century American magazine. Like the Making of America project, the contents are page images backed by hidden OCR'd text. The accuracy of the OCR process is insufficient to provide reliable transcriptions but sufficient to provide reasonably good indexing for purposes of searching. The indexing and display software currently used are DynaText and DynaWeb, which were acquired through Inso Corporation's educational grant program. Along with the texts themselves will be ancillary documents related to the Book's contents, supporting documents describing the encoding, and a tutorial for creating similar documents.

In choosing to create the digital Godey's Lady's Book, several issues were considered:

At each phase in creating the digital Godey's Lady's Book the question kept at the forefront is: can this model be duplicated by individuals or small groups with limited resources while remaining in concert with, and informed by, the broader text encoding world? Unless the answer is yes, the chasm between early adopters and the majority of humanities scholars may well remain unbridgeable.


