MashBill is a web annotation tool being developed for Civil War Governors of Kentucky by Brumfield Labs. It solves a significant structural problem for the project. Editorial work on CWGK texts has been performed in a FileMaker-based solution called DocTracker, used by a number of editorial projects. And while DocTracker has an annotation module capable of producing a glossary of named entities, DocTracker is only accessible to users with a paid FileMaker license.
Because CWGK received grant funding from the NHPRC to annotate its documents through Graduate Research Associates working remotely from universities across the country, the project hoped to create a web tool that was accessible from any location and did not require a software license fee.
CWGK also believed that DocTracker's ID annotation module would require significant modification to produce and export the standalone TEI-XML documents which are necessary to create an independent and interconnected network of entity records.
After initial exploration in the CWGK version of DocTracker and discussions with developers and digital humanists involved in similar projects around the world, CWGK and Brumfield Labs decided to create a standalone annotation tool, MashBill.
The planned annotation workflow using MashBill is diagrammed here and explained in greater detail below.
The annotation process will begin when a CWGK user logs into Hypothes.is, a tool which allows users to highlight text and create custom annotations on existing web pages. In this case, the user will view a document on this site and highlight all unique entities on the page.
This information is sent from Hypothes.is to MashBill, which will match the highlighted text to an index of known references to entities. For example, MashBill will ask the user to confirm if the text "Hopkinsville Ky." refers to the known entity of Hopkinsville, Kentucky and if "Gove. Bramlette" refers to Thomas Elliott Bramlette.
If the person, organization, place, or geographical feature does not exist, the user will create a new entry, writing the results of their research in a biographical/informational text field.
Behind the scenes, MashBill will create a new TEI-XML document that contains the new entity's unique identifier, metadata, text biography, and citations. That TEI-XML document will be stored in the CWGK GitHub repository, where all project files are kept not only for the use of project staff but are available to be downloaded as a corpus and analyzed by researchers searching for data patterns within the XML.
See sample TEI-XML entity records for:
Each time an existing entity record is modified (if, for example, new research or additional documents meant a biography needed to be expanded) MashBill will pull the existing TEI-XML from GitHub and push back the modified version.
This same push-pull process from GitHub will also allow MashBill to automatically insert reference tags containing the unique entity identifier into the text of the document. MashBill will pull the document XML down from the GitHub repository, insert the reference codes in the text highlighted by the user in Hypothesis, and push a newly marked-up transcription back into GitHub. When annotation work is completed on a collection, CWGK editors can batch upload the new, annotated transcriptions into Omeka along with the corresponding entity files. Modifications to the Omeka style sheet will allow the transcriptions and annotations to display hyperlinks between and among documents and annotated entity records.
MashBill will also allow CWGK to track and display relationships between identified people, organizations, and places. This will enable the project to trace and visualize the social and geographical networks within its documentary data set.
The most basic level of relationship tracking is co-occurrence, when entities appear alongside one another in a document or in the text of an entity biography. CWGK and the developers at Brumfield Labs hope to move beyond co-occurrence, however. Inspired by the system of encoding relationship types developed by the Six Degrees of Francis Bacon project, MashBill will allow annotators to tag relationships described in the documents themselves.
After all entities for a given document have been identified, the annotator will tag known relationships between the identified people, places, organizations, and geographical features and assign broad category tags (familial, policital, economic, military, social, etc.) to the relationships.
In addition to the networks traced by the interlinking of documents and annotations, MashBill will allow CWGK to track broad categories of relationships as they appear in the documents.
Nor is CWGK limited to the familiar social network web. Other forms of data visualization may include a color-coded matrix built from a sample document set, or an arc diagram that shows the same data in a different form. Final decisions about how this site will incorporate this visualized data will come in the next year.
Planning and early prototyping for MashBill is currently complete. The project team will work through the end of 2016 to produce a working version of the tool so that the eight NHPRC-funded Graduate Research Associates and on-site CWGK staff can begin to annotate the 1,500 documents scheduled for full publication in the fall of 2017.
The first year of work with MashBill should be an exciting time for CWGK. Not only will the the broader context of the project's documents come into sharper focus, but both CWGK and Brumfield Labs can get a good sense of how MashBill works with a variety of remote users, opening up the possibility of future crowdsourced annotations. Just as Six Degrees of Francis Bacon invites the expertise of professional researchers and students in the classroom to add to its network of known early modern people, CWGK is interested in exploring ways to connect to the deep knowledge of local and family history groups in Kentucky and across the United States.