Nailing Down the Jello: Three years of running a specialized Social Science Reference Web Site

Professor Craig McKie
Dept of Sociology
Carleton University

http://www.socsciresearch.com/


ABSTRACT

Beginning in the fall of 1994, I began to offer my graduate students a list of 10 Web sites on paper. These sites were in my view useful resources in their research activities. I converted this short list to an html document shortly thereafter and placed it in my personal Web files area. I registered this page with Yahoo a few weeks after it started up its activities when it had no more that 2000 sites listed and additions were being done manually. Since that time, the Research Engines site has grown to a 130k page of links useful to the social science research endeavour. It continues to develop as the principal high volume social science reference site with a mirror at UNESCO in Paris. Experience with the page demanded several reorganizations for the sake of ease of use and ease of maintenance. It now has 18 subsections, roughly based on traditional disciplinary lines with some interesting exceptions. Volume measurements were always a part of the design. The number of hits now routinely exceeds 1000 per day and has exceeded 2000 per day in the exam period in early spring. These visits come from all over the world and transcend the social science research community (the intended audience) in several interesting ways. An experiment with an online questionnaire attached to the page carried out in the spring of 1996 established the feasibility of online questioning and automated data capture while yielding rudimentary demographic data on users. Insights gained from this experience are related to the Web editorial role (selection of content), obligations for currency (validation of links and content -- the nailing jello problem), the manner in which help and assistance can be obtained, 'surveillance' of new content to maintain currency, and the problem of institutional identities, support and advertising. Wider issues for academic users are: citation principles, establishing provenance, attribution and date stamping and of course the wider issue of copyright.


I am here today to talk about my experiences in inventing, developing, administering and (when it got too big) migrating a social science reference site over the past three years. I am characterizing the process as akin to nailing Jell-O to a wall since a permanent state of flux is the rule of the day, at least in my experience. Links go dead without notice; ISPs change operating systems and versions of the web server software without notice; they change the location of web logs without notice; and important new links and sites appear most often without notice. The net effect is not the familiar thought of Chairman Gates "where would you like to go today?" but rather "where is where you would like to go today?"

History & Experience

In the fall of 1994, I handed out on paper a list of ten useful resource web sites in my graduate seminar in research design and analysis. I had been putting off the major day of reckoning when I was going to learn how to set up a gopher site. Almost as an afterthought to the distribution of the piece of paper, I abandoned the gopher site idea and undertook to set up a small web site in my account area which they could then use to easily access the ten sites. I got on the phone to likely advisers in the university community and they told me how to set up a public files area in my UNIX account and how to set the permissions for the files so that others could use them. I really at that stage had no idea what I was letting myself in for. I told my class about it in the next class period and urged the students to try it out. Reaction was good.

The next Friday evening after, the baseball season having ended prematurely, I decided to spruce the site up a bit. I visited a nice looking site and copied the code in toto. This allowed me to deduce how one embedded images and set backgrounds and chose colours and so on. Then I hauled out Corel Draw and experimented with design. I chose lurid colours which people later on loved to complain about (generally they had 256k video cards and 16 unsatisfactory colours to work with in their displays). The dark purple colours drove such visitors wild since they couldn't see anything except swirls of purple and black. They did not hesitate to inform me of this discomfort which indirectly let me know that people out there were using the site which I had dubbed the "Universal Codex for the Social Sciences" as, I thought at the time, a not very subtle attempt at humor, given the meager number of sites which was then included. I also began a parallel site completely unenhanced with colourful distractions for the video card disadvantaged.

When the list reached twenty or so uncategorized links, I decide to submit it to the Yahoo site. At that time it barely had 2000 links listed. I had found a reference to it on an underground Japanese web page, two or three weeks after it started up. In due course I got back a personal message from the proprietors then operating Yahoo out of a dorm room at Stanford or Berkeley. To my everlasting regret, this message has been lost. I would now rank it with my signed artwork for Byte Magazine covers from 1979 in collectibles and sentimental value. Today, my research resources site, the direct successor to the original page is the most consulted link for the social sciences at Yahoo. But a lot of water has gone over the dam since those early days barely three years ago. I have had to learn on the fly as it were and this afternoon I would like to share some of my insights with you here.

The Present

Since the early days, the Research Resources page has been through many changes, most notably the introduction of 18 subsections. Its total size has grown to 130k of links and the next version will add about 30k more as it will be greatly expanded and augmented. The original site is fixed in content but I am still getting 1000 hits a day on the colourful and the plain vanilla versions I left in place there. The mirror site at UNESCO in Paris is heavily used and the main site in Toronto has experienced heavy traffic as well though I do not have log analysis software at either of the latter two sites to understand the nature of the traffic.

The traffic does come from all over the world and transcends the intended user group, the social science research community. I get emailed requests for assistance regularly only some of which I can find the time to respond to. Still, the origins of these requests are intriguing. I particularly remember one from the research department at the World Service at the BBC in London for instance.

Insights:

  1. Enlist your users

    The research resources page is a place where stranger-to-stranger information exchanges are mediated. When present, the strangers can be asked if they have anything to contribute to enrich the marketplace. Many do so.

  2. Validation sequences

    I have learned never to trust the address of an interesting looking link which falls into my hands. I test out every one of them for an initial editorial look-see. Then I test it again when it is entered on the revised page but before it is seen by the public. In addition, as often as I can manage, I run the page through the Dr.HTML site which checks for broken links, down servers and other misfortunes. It is just invaluable. It will only work for pages below a certain size limit, which was a powerful reason for me to split up the main file into manageable chunks.

  3. Take care with institutional facilities and identities

    I have always been concerned about this issue. A successful web site generates traffic for my university's server. If you are not sensitive to the traffic costs you incur, the administrators can be difficult. I had a no advertising rule in anticipation of a negative reaction from my university hosts.

    When it became feasible and necessary to migrate and accept commercial sponsorship, I argued that I could not do this on university facilities and that the sponsorship message should be low-key and non-intrusive. I hope that I have succeeded in striking the right balance. I did for instance have to argue down the size and placement of the ads to tasteful dimensions.

    Having said this however, I do host some specific pages of research resources for immigration topics and refugee resources which were paid for by a government department (though their sponsorship is not acknowledged on the pages themselves).

  4. Spin-offs

    The creation and development of my research page led to me writing two "how-to" books and the second, published this summer by McGraw-Hill Ryerson [Using the Web for Social Research] in turn led me to accept commercial sponsorship for redevelopment of the graphics content on a commercial ISP where it now resides somewhere at an undisclosed location in downtown Toronto. The practical difficulties for me are that the files are now way off campus and I have to do the standard UNIX file transfers after revisions take place. This has greatly slowed the process. I understand that if your server is using Windows NT, then you can use Frontpage to manage you links effectively, but commercial ISPs often use a standard UNIX setup and in any case I did not have a say in the choice of a service provider. I am much impressed with Frontpage and what it can do and I think my site would be better if I could use it but I can't. It is however something to keep in mind.

    Other major sites have had spin-offs as well. SOSIG in England for instance published a user guide pamphlet and gives it away. They also run training sessions for which they charge. There is such a pressing necessity for training in the use of the Web tools that I think this avenue is very promising. And of course there is the Web course for credit concept which as yet I do not feel at ease though perhaps some of you do. I am more than willing to try it out and a colleague in social work at Carleton is now giving a course in this fashion.

    For myself, the generation of new "how-to" books is a obvious way to go. With the recent advent in Canada of Data Liberation, there is for instance a crying need for something called "How to use Statistics Canada data". The data is now available in the academic community free of charge to the user (but not of course to the institutions); the problem is very few students and faculty are aware of its strengths and limitations and may lack the skills to analyze with statistical packages such as SAS or SPSS.

  5. Copyright

    There is starting to be litigation of the presumed "right" to post active links on your page to copyright material (such as for instance Associated Press Wire copy). I am aware of one case in Scotland and one in the southwestern United States where web site operators have be sued because their sites contained such links. In addition there is the "Radikal" case in Germany where a prominent Socialist legislator was charged with a criminal offense for having a link to the Dutch anarchist webzine Radikal. I took the step of putting the same link on my research resources page as a gesture of sympathy for the "felon". I understand the charges were subsequently dropped on a technicality but the principle of these lawsuits and charges is alarming. I will continue to operate as if active links are public domain material but I am not altogether sure whether this principle would be upheld in a court of law. Recent discussions of a new international convention on Internet practice (discussed notably by Ira Magaziner on behalf of the U.S. government) add to my unease. The current regime of 'lawlessness' is not likely to persist so we collectively had better make our views known on this matter. I am assuming most everyone associated with a high volume web site is in favour of a wide open regime of exchange but perhaps I am wrong on this. The copyright issue has an evil twin, the use of strong encryption and its legal status but that is too big an issue and its ramifications too important and consequential to discuss here in any detail. Suffice to say that I am in favour of unlimited use by anyone who wants to of strong encryption (by which I mean encryption which cannot be broken by the security agencies of the state). It now exists but its use is being actively opposed by most Western governments who purport to fear its use by crooks, terrorists and pornographers.

Conclusions: What have I learned

Let me conclude with some general observations about the web site management and development process. Some of these follow from my previous remarks but some are more general yet:

  1. Its a lot of work. It is the farthest possible thing from the fire-and-forget smart munition. It takes constant feeding and alteration and you are prodded to do these by the clients.

  2. Its habit-forming in the sense that it becomes part of the routine. I used to operate on a weekly cycle of changes culminating as many things do in a big change of character on Friday afternoons. I now have abandoned this approach in favour of the big re-write at much more lengthy intervals. Big rewrites can be scheduled and done with more concentrated attention. This is an admission that the activity has become one of major importance (though not yet in the tenure/promotion process).

  3. It attracts attention. This is both good and bad (see above points). It definitely increases the amount of incoming email as strangers approach you for help.

  4. The role involved is almost exactly the same as that of the traditional editor. You are called upon to challenge the work of others, and if it withstands the challenge, then you accept it. It is worth saying that some resources do not meet my arbitrary unaccountable standards. My own personal judgment is thus by default being inflicted on an unsuspecting web world very much in the mode of the old print editor.

  5. Logging and analysis thereof is a continuing problem. I don't feel I know enough about the user community and their interests. My interests are therefore paramount. I did try one experiment with on-line questionnaires (it is feasible) and I did try some matching of responses to the log but I can't say that I learned very much since the response rate was about 2%. That's great if you are in the junk mail business but for survey purposes it is unusable.

    Finally and 6., I must conclude that the activity and the site itself are worthwhile and valid. Evidently some needs are being met. Though I did not set out to do this, it is evident to me now that were the page not to exist, somebody would have had to invent it. I think of it as if it were the Amsterdam flower market. All manner of wondrous blooms are on display and sold in box lots in that place but none of them grew there and few of them expire there. They are shipped off somewhere else in the world to carry out the illuminatory role in their brief lives, much like the links on my page.


Professor Craig McKie
Dept of Sociology
Carleton University
1125 Colonel By Drive
Ottawa, Ontario
K1S 5B6
tel (613)520-2600 ext.2626
fax (613) 520-4062
cmckie@ccs.carleton.ca


©,1997. The author, Craig McKie, assigns to the University of New Brunswick and other educational and non-profit institutions a non-exclusive license to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive license to the University of New Brunswick to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the author.