  • The Computer Genealogist: Documenting Electronic Sources

    Steve Kyner

    In the past, electronic sources have had a hard time getting any respect from genealogists. In part, that was due to the low level of reliability of early electronic content; most of it was simply e-mails among individuals of uncertain skill. Other electronic files were often idiosyncratic compilations of equally uncertain provenance and accuracy. Adding to the problem, early versions of genealogy software either made no provision for source citations at all or did so in ways that anyone striving for scholarly integrity would find clumsy at best.

    With increasing momentum over the last ten years, all that has changed. There are now bountiful resources in electronic format, many of which share the best qualities of revered print publications. Just as importantly, there are now multitudes of family historians whose initial enthusiasm (and in many cases only sustenance) has come from genealogy web sites and CD-ROM publications. Electronic sources have not only achieved a certain measure of respect, they are downright mainstream.

    Mainstream or not, however, electronic sources are confusing for many of those who use them. If one has just consulted a scanned book on CD-ROM, what is the source? Is it the book or the CD-ROM? If a tombstone inscription has just been found in a database linked to a web site, how does that go in a footnote? Are the e-mailed thoughts of a friend a source, or should they even be recorded? All the questions one might ponder with traditional sources are relevant to electronic media, but the medium itself leads to still more questions. This article will attempt to offer some compass points on the electronic map.

    What is an “electronic source”? For purposes of this discussion, it is anything you consult in the course of research that is read from a computer file. The file might contain text or pictures; it might arrive on CD or in the form of a web page, or it might simply be the compiled database of someone using a genealogy program. The file might be the first form in which a particular source existed (as in the case of an e-mail correspondence) or it might not. It could be the transcription or the reproduction of a previously published effort, either in print format or electronic format. There are many forms of electronic source, and often seemingly identical material appears in many different forms. It’s important, however, to know which form is being used, and it’s important to be specific about what is cited. Just as there may be material differences among different editions of a single book, there are likely to be material differences among different electronic forms of reference.

    One example of an electronic source is the New England Historical and Genealogical Register on CD-ROM. The Register is a well-known and highly respected publication, though it has never claimed infallibility for its authors. A standard form of citation for an article in the Register might look like this:

    Gale Ion Harris, “John Edwards of Wethersfield, Connecticut,” New England Historical and Genealogical Register Volume 145(October 1991): 317.

    Or, in abbreviated format:

    NEHGR 145:317.

    Too many researchers believe that whether they consult the original edition of volume 145 or the CD-ROM edition, the same citation is sufficient. It is not. The Society is no more infallible than its authors, and in fact errors have been found on the CD-ROMs which include missing or mis-ordered pages. I’m not aware of any such errors that actually distort the meaning of the text, but the point is they could. And there are many electronic products in the world, both CDs and others, which have been much less carefully supervised than was the production of the NEHGR CDs. We can also look forward to eventual new electronic formats for the Register, so ten years from now it might be even more important for someone to know what format and edition was actually consulted.

    No matter how well respected the electronic source is, it’s important to identify it as precisely as possible for what it is. In the example above, “NEHGR 145:317” should become:

    NEHGR [CD-ROM: 1996], 145:317.

    Or in the longer format:

    Gale Ion Harris, “John Edwards of Wethersfield, Connecticut,” New England Historical and Genealogical Register Volume 145 (October 1991): 317 [CD-ROM: 1996].

    The revised citation identifies the medium actually consulted and it adds the date of publication of the CD. In its simplest form, that is full disclosure for an electronic source. Some style guides recommend an added notation indicating the date the CD-ROM was viewed. In all cases I can imagine, such a detail is no more useful than knowing the precise date on which the author of a printed book consulted a paper source. Except in rare cases where subsequently destroyed original sources are referenced, the value of a “consultation date” is superfluous for citations of material with fixed publication dates. As noted below, however, it is vital in other contexts.

    If at some point it is discovered that a prankster inserted a page from the Virkus Compendium in the Register CD-ROM, and I’ve happened to cite that page, I (or someone else) will be able to recognize and trace any erroneous conclusions that result. Additionally, if I know the medium which I consulted in the first place, it’s probably going to be easier for me to go back to the source I used if I want to double check something. There are many obscure sources available electronically, and there are also many obscure sources on my bookshelves and in my files. Knowing the medium I originally consulted may well save me a good deal of wasted time hunting for it.

    The NEHGR CD-ROMsvery conveniently provide citation references (identifying the volume, publication date and print-out date) along the bottom of the page when printed out. Not all CDs make it that easy to identify the source of a printout, and in some cases there may not even be recognizable pages. Never assume that everything a useful citation needs is on the printout or on the cover of a CD jewel box. For CDs as well as other forms of electronic sources, try to identify the most direct way to locate the information you’re using, and give that in your citation.

    There is often an element of art in the construction of genealogical citations. They often need to be nested to call attention to the source of a source, or tempered with comments as to the physical qualities of the source. Clarity of communication should always take precedence over blind adherence to a specific format. The best way to determine what should be communicated in a citation is to keep firmly in mind the two primary reasons sources are cited in the first place:

    1. So the researcher knows where information came from;
    2. So someone else can look it up.

    The actual form a source citation takes and the order in which various elements appear, is much less important than including all the necessary identifiers in some rational way. If you’re publishing in a journal, that journal’s editor will govern citation format, and it will vary from one to another. Nor is published style permanent. An acceptable citation form of 50 years ago may not meet today’s standards. If you’re simply recording sources for your own information, probably consistency of format is more useful than any particular style, though readability is enhanced if the format adheres to conventions customary in the field.

    With electronic sources, it becomes even more important to consider thoughtfully WHY you are recording them in the first place, because both form and content of citations for electronic sources are evolving. Technological innovation creates a moving target as far as just what constitutes a complete citation, and as online sources proliferate we will also see innovative forms of presentation which may not obviously lend themselves to accustomed formats.

    There is no question that electronic sources can be highly useful, but they are also almost never sufficient. By its very nature an electronic source is at least one remove from a verifiable primary source, and often has passed through many intervening processes (scanning, transcription, Optical Character Recognition, etc.) and many iterations and users before reaching your desk top. Can you trust electronic sources? A good genealogist trusts no one! The question of trust is no different for electronic sources than it is for those in print. Does a creditable institution certify the accuracy, the completeness or anything else? If you’re looking at the work of an individual, is that person reputable? Do they provide documentation of their sources? Have you spot-checked and tested their work against an area with which you are familiar? In all cases, can you locate confirmation in the records for information you’re obtaining?

    Government archives generally get high marks for the reliability of information they collect, but most government jurisdictions haven’t gotten beyond the “User’s Guide” stage in digitizing their holdings. The National Archives’ NARA Archival Information Locator (NAIL) is a demonstration of the possibilities for archival source retrieval and also a case in point for one of the pitfalls in the use of rigid citation formats.

    One of the items available in the NAIL project is a scanned image of a Casualty List for the RoughRiders July 1st to 3rd, 1898. As a product of the National Archives, you can be pretty sure it’s what it’s represented to be. Because it’s a scanned image, you don’t rely on anyone else’s interpretation of what it says. And it might provide the answer to what happened to great great uncle John. I could open up my browser and go directly to the image, but the Uniform Resource Locator (URL) or address I’d have to enter is extremely long, and anyone’s chances of getting it entered correctly in a browser window on the first three tries aren’t very good. If I were publishing electronically, I could just insert that address as a link and no one would have a problem with it. In print, it’s a bit much. So in this case, even though I have a direct address for my source, I’m going to cite the way I actually got there, which is a Search of NAIL Digital Copies for the terms “Rough Riders” and “Roosevelt”. The citation would appear as follows:

    Casualty List, Rough Riders, July 1 to 3, 1898, p. 1 in results of keyword search on terms “Rough Riders” and “Roosevelt” in NAIL Digital Copies (all media), U.S. National Archives and Records Administration. Available [Online]: [21 September 1999].

    An added benefit to this method of citation is that it will work well even if the page on which the document of interest appears is renamed or moved. The document name is specified because there are multiple documents to be found with the same search.

    The web site of is a popular destination for many researchers. The most effective way to use the site is by searching all available databases for a particular name. Many of the databases available are in fact previously published books that have been scanned or transcribed for electronic searching. When the results of such a search are examined, a heading provides an approximation of the title in which references were found, but nothing like a good bibliographic reference. Fortunately, Ancestry has very responsibly included complete bibliographic information on a separate linked page for each resource. Once again, the precise URL of the page cited is not essential to a useful citation. An example citation for the results of an Ancestry search might look like this:

    Results of Name search on term “Green” in Albert C. Bates, editor, Rolls of Connecticut Men in the French and Indian War, 1755-1762, Vol. I 1755-1757, Hartford: Connecticut Historical Society, 1903 (transcribed by Iris Guertin). Available [Online]: [16 September 1999].

    Most of the information provided in the above citation does not appear anywhere on the web page containing the data used, but it is vital for a number of reasons. The citation clearly communicates the fact that the reference cited was originally a print publication and can therefore probably be found in that format. It identifies the editor and transcriber of the work in question, two significant indicators of the quality of the original work. And the final bracketed date represents the date the material was actually consulted. For electronic files in a dynamic setting such as the World Wide Web or the Internet, that final date is important because such files are continuously subject to editing, revision and other changes that may impact other researchers.

    Many websites will provide no clue to bibliographic detail. You should insist on it, and if it’s not there, e-mail a request for it. It’s fine to develop a base of knowledge via the Web, but in the hierarchy of quality sources, an original printed page will always rank higher than its electronic reproduction, whether scanned or transcribed. No matter what anyone tells you, a printed page is a lot more likely to be in existence and locatable 100 years from now. Even when dealing with secondary sources, the closer you can get to an original, the more likely you are to avoid errors and deceptions.

    Speaking of errors and deceptions, enthusiastic e-mail messages purporting to identify long lost missing links in various ancestries are the bane of careful researchers. There are times, however, when even the flimsiest excuse for a trail is better than being totally without direction. A chance e-mail may provide the seed for a productive hypothesis. Sometimes such hypotheses are rooted in sources you’d just as soon not admit to using. We beat the drum constantly for people to discard information that is not supported by good documentation, and I am not suggesting that such hygiene be neglected. But if some “flake” posts a theory on a mail list that has a bearing on a problem I haven’t been able to solve, I’m going to make a note of that theory and look for ways to prove or disprove it. That electronic posting is the source or basis for my new hypothesis. And I’m going to give it the full citation treatment that I would give to any census record or birth record. Not because it contributes to the ultimate proof, but because I need to remember that it’s totally unproven and may derive from a dubious source. I won’t use that source if I eventually publish the results of that line of research (though I may render public thanks for the inspiration given), but until it’s proved or disproved, I need to remember where the idea came from. Two examples of e-mail citations follow. The first demonstrates the form for a direct and private e-mail communication, the second treats the same e-mail message as part of a public mail-list discussion. The form used should match the specific circumstance.

    Cherie (, Re: [French-L] Willis French, NY [e-mail to Steve Kyner (]. 19 October 1998.

    Cherie (, “Re: Willis French, NY” [Discussion], French Surname List ( 19 October 1998. Available [Online]: [19 October 1998].

    There are significant differences in the two forms, but I should point out first that I have no idea of the true name or address of this correspondent. The information she (?) provided turned out to be accurate and useful, regarding church memberships in Amsterdam, NY. Copies of a document mentioned in the e-mail have since been located. Some style guides insist on a full name and postal address for every e-mail citation, but the willingness of some individuals to share their knowledge is directly proportional to their sense of privacy and security, and since the information needs to be confirmed anyway, there is not always a need for intrusion on that privacy. The citations above would go into my notes, not into a publication, but it is no less necessary to properly attribute the source.

    For the private e-mail, relatively bare bones information is sufficient: originator, subject, recipient and date. Having been known to lose track of, or accidentally delete, e-mails from my hard drive, the reference is actually to a printed copy of the e-mail that is in my files. The hard copy habit is one I strongly recommend.

    If the message was posted to a public list, more information is necessary. Everything that would be referenced in a private e-mail appears in the second citation, but now we have added the name of the public mail list as well as the address of the electronic archive where messages from that list are stored. Such archives make the checking of a message’s contents much easier than is the case with a private e-mail. Often it takes a little digging to determine if there even is an archive, but it is worth the effort in allowing others to easily retrace a series of research steps. In many cases, public mail lists contain highly erudite and well-structured discussions of specific research problems that rank them with the best scholarly work in print.

    There’s no particular mystery to creating good citations for electronic sources, but there is a good deal of thought and a measure of empathy involved. Think about trying to find your source using nothing more than the information you’ve recorded in a citation, and your citations will be headed in the right direction.

    For a more detailed and thorough discussion of electronic source citations and a style guide to numerous forms, readers should consult Maurice Crouse, “Citing Electronic Information in History Papers,” reprinted in The Computer Genealogist Volume 8, Number 3/4 (May/August, 1999) and available in this current edition.

