Civil War Bluejackets: Citizen Science, Machine Learning, and the US Navy Common Sailor
The “digital turn” in Civil War era history has now reached the age of artificial intelligence (AI). ). In 2022 Cameron Blevins and Christy Hyman challenged historians “even self-professed Luddites—to approach today’s shifting technological landscape with the same intellectual curiosity and rigor that they bring to their studies of the Civil War era.”[i] We have decided to take up the challenge.
Today, historians of all kinds are particularly concerned with the effects of AI-based Machine Learning (ML), especially generative models such as ChatGPT, on teaching and learning, fearing students will be “ghosting” their written assignments.[ii] Others have examined the potential and pitfalls of using ML in historical research.[iii] Our project, “Civil War Bluejackets: Race, Class and Ethnicity in the US Navy, 1861-1865” (CWB) falls on the optimistic side of ML in history research by making positive use of machine learning techniques to rewrite the history of the common US Navy sailor in the Civil . ML can, with the proper human input, enhance, in innovative ways, the social history of the Civil War era
CWB is a British Arts and Humanities Research Council grant funded project led by Northumbria University in partnership with information scientists at the University of Sheffield and the University of Koblenz-Landau. It centers on the US Navy Muster Rolls from the American Civil War, available on the US National Archives (NARA) website. The project’s main aim is to transcribe these recently digitized rolls, creating a powerful new database and research tool for the study of c. 118,000 wartime sailors, most of whom were drawn from among the poorest sections of nineteenth-century American society. This transcribed list will make the digitized rolls more accessible and usable. We will then use that transcription to machine-link to other digitally available resources connected to individual sailors, such as Rendezvous returns, hospital tickets, and most importantly, pension applications, all currently available online through Fold3.com. The resulting internet resource of these Bluejacket common sailors, so named for their short French-style navy jackets, will link tens of thousands of working-class wartime servicemen to all their digitally available military records. This result will also allow us to use the data generated to understand how the composition of Navy vessel crews changed over time—such as in this example from our pilot study
[Crew Ethnicity and Nativity on USS Louisville, 1862-1865]
which examines the ethnicities and nativities of the crew of the City-Class ironclad USS Louisville between 1862 and 1865. We should also be able to measure other demographics at scale, such as occupation, age, nativity, and even height, perhaps allowing us to understand the health of many working-class Americans in the mid-nineteenth century.[iv] US Navy records are particularly suited to such analysis, because, unlike the Army, a wide range of complete or near-complete naval personnel records have been digitized, including practically all pension records.
In designing the project, inspired by the work of climate scientists collecting historic weather data from Royal Navy and US Navy deck logs, we decided to use “Zooniverse” to facilitate our crowd sourcing, “Citizen Science” effort. Zooniverse is an online platform developed by the Citizen Science Alliance to allow the public to actively participate in major research initiatives.[v]
[Civil War Bluejackets on Zooniverse: Phase 1]
Initially utilized primarily for scientific analyses such as the examination of space and the cosmos, the access it provides to large numbers of willing volunteers is increasingly attracting humanities projects, especially those seeking to examine large data sets. Since project launch on Zooniverse in September 2022, Civil War Bluejackets has attracted over 1,600 volunteers, who have made almost 33,000 classifications (individual transcriptions). In this first phase of the project, volunteers were presented with an individual muster sheet and asked for any information on the muster date. Volunteers then transcribed certain workflows (columns in the original muster sheets), among which were name, birthplace, age, occupation (prior to enlistment), citizenship and rating (rank) and height. Another workflow we wanted to analyse was “eyes, hair and complexion.” It was here where race was often identified, either in physical description or more bluntly with terms such as “contraband.” The quarterly musters regularly recorded how many “contrabands” had been enlisted into the ship’s compliments, and officers (and the Navy) used the records to aid in the administration of the ship’s crew.[vi]
We also asked the Citizen Science volunteers to draw a colored bounding box around each entry in their chosen workflow and transcribe what they read.
[Creating a “Golden Set” of Data ].
These bounding boxes helped our information science co-investigators at Sheffield to develop a “gold standard” set of data, which they then used to create sets of training and test data. The training data is split by workflow, with each being processed by a separate Deep Learning Neural Network-based transcription model to learn how to “read” the handwriting on the muster sheets. As a result, all our citizen scientists have helped us reorient the overall Zooniverse project. The separate set of test data is then used to evaluate how accurately the models are able to transcribe handwritten text they have not seen before (i.e., been trained on). This gold standard dataset has proved fruitful. The 33,750 or so transcriptions on vessels beginning with the letters A and B have been enough to train our models to read the nineteenth-century handwriting of various US Navy junior officers. The machine learned how to transcribe numeric columns, such as terms of service and ship’s number, fairly quickly. Non-numeric data, such as names and place of birth proved the trickiest, but our models are now capable of achieving character-level accuracy rates of around 98% on numeric columns and around 94% on non-numeric ones. As well as producing the most probable transcription of a piece of handwritten text, the models also provide an estimate of their confidence in the transcription.
We then had to build a different form registration model capable of taking a digitized image of a muster sheet and splitting it into the individual columns and rows present on the original form. Achieving this means that we are able to automatically process the remaining vessels (i.e., those with names starting C-Z) without requiring humans to manually draw the bounding boxes around each cell I the form – a laborious and labor-intensive task. Once the form registration model has identified all of the cells in a new muster sheet form, each of these can be passed to the relevant transcription model to be further processed to obtain a most-probable transcription and a confidence score.
The next step on Zooniverse will become one of checking the machine output rather than transcription, a much simpler and more user-friendly task.
[Civil War Bluejackets on Zooniverse: Phase 2, Correcting automatic transcriptions]
We are pleased to announce that we are about to move to Phase 2 of the project where , based on its own self-assessed confidence level in its transcription. This means that we will only ask our volunteers to check those pieces of transcribed handwriting for which the model has a low confidence score, further significantly reducing the amount of work that humans need to do. We encourage Muster readers to sign up for this second phase to see how the platform works for historical projects.
This initial transcription effort led to a certain self-satisfaction among the team but a challenge from our advisory board made us reflect more on our citizen scientist pool. The ethical awareness of other digital humanities projects encouraged us to think harder about the ethical implications of our work.[viii] In seeking initial ethical approval for our project, we had been aware of the literature around citizen science “crowd sourcing” and the reality that this is “free labor” people provide without remuneration.[ix] Of course, people volunteer for this kind of work and Zooniverse has terms and conditions which allow for the use of the data they collect.[x] It also has a strong privacy agreement against sharing any volunteer personal data. All it asks for is a valid email and a username—even providing your real name is optional. No other information is needed to participate. Yet, the challenge was how did we know who our volunteers were? In a project dedicated to understanding class, race, and ethnicity in the US Navy, how diverse were our transcribers?
With Zooniverse not collecting any user personal data our only way to progress was to contact our volunteers collectively through Zooniverse intermediaries. Zooniverse staff, for example, distribute our citizen scientist newsletter, and group mail everyone who signed up to the project with any major updates. We decided to tackle the issue through reaching out to genealogical groups that would potentially make our citizen science base more diverse. One of the groups we worked closely with was the Afro-American Historical and Genealogical Society (AAHGS), who were particularly interested in identifying African Americans in the musters
After initial discussions and a workshop, the AAGHS launched a “Memorial Day to Veterans Day” drive encouraging its members to transcribe on Zooniverse, ultimately transcribing thousands of records. An example of the rewards of such collaboration came when one of those volunteers, R. Roberts, who uses the handle @Grobster on Zooniverse, drew our attention to the age of one African American Third Class Boy aboard USS Brandywine.
[In the Footsteps of Frank Branch, African American Bluejacket]
His name was Frank Branch, listed as just 12-years-old. But @Grobster went further than just highlighting Branch on the muster roll- they directly engaged with the Bluejackets team in the knowledge creation process, conducting research that greatly aided our efforts to uncover this story. Using multiple digital resources created as a result of his naval service, it revealed a level of detail about Branch’s life that illuminated not just his wartime experience, but also his life (and escape) from enslavement and the post-war trials and tribulations he faced as he sought to make a life for himself in the post-war United States.
Grobster has since gone on to become a CWB Project Zooniverse Moderator helping other Citizen Scientists to understand and explore the muster sheets. Together with our other volunteer Zooniverse moderator, Robert Croke (Zooniverse handle @SandyCycler) they are continuing not only to play a major role in administrating the public face of the project but are engaging in significant amounts of personal research into these sailors and their vessels. What is very apparent to us is that the success of citizen science initiatives depends on a consistent and honest engagement with our citizen scientists. A volunteer community rarely generates organically and requires encouragement and nurturing through the lifetime of the project. At CWB, this has come in the form of aids and guides on the Zooniverse platform as well through mechanisms such as YouTube videos, public/online talks/training sessions, and, most importantly, through the dedicated Zooniverse project “Talk” forum where users can raise questions and queries. We also highlight the work volunteers do in a series of posts on our webpage entitled “Bluejacket Community Discoveries.”[xii]
[Bluejacket Community Discoveries]
We believe that the citizen scientists should be publicly acknowledged, with their permission and while preserving their anonymity whenever we can.
CWB is also interested in exploring user motivation and reward at a deeper level. An integral component revolves around learning when, why and how volunteers engage with humanities projects on Zooniverse. We have currently based our recognition of their work on those who engage the most, our superusers, but what about the more casual user? Our superusers, who have become moderators, help us understand what volunteers like about the tasks and what they do not. They tell us of frustrations in transcription, for example, helping us adjust workflows. As moderators they also provide support and encouragement to other users, exploiting their acquired expertise to pre-empt potential mistakes common among new volunteers and to guide them through the Zooniverse process. They have helped us too in co-creating Phase 2.
Ultimately, we intend this project to produce another digital resource for those interested in their ancestors, not just to fill out their family trees, but also to understand the lives of their historical relatives. In turn, the data generated, will help us and other scholars analyse the macro issues of the Civil War Union Navy and how its leaders managed a racially and ethnically integrated service. Though there is not nearly as much work on common sailors as there is on soldiers, there are some excellent surveys from Michael J. Bennett, Steven J. Ramold, Dennis J. Ringle, and Joseph P. Reidy. The new database, however, and the fact that most the major records for all US Civil War US sailors, musters, pensions, etc., are digitized, gives us an opportunity to examine the subject in innovative macro ways. Black and white, native and foreign, served together on vessels, but, for example, how did those ratios change over time, and from vessel to vessel, across the entire navy? Another is what was the occupational and age profiles of all sailors over the War?[xiii] Using this mass of new data that ML has helped provide us, we plan to write a new history of the Civil War common sailor in the US Navy focusing on class, race, and ethnicity.
This machine transcription of the nineteenth-century handwriting of hundreds of US navy officers, may be applicable to other manuscript records, perhaps providing more opportunities to rewrite the social history of the Civil War era and beyond. This potential is just one issue we want to discuss with others in CWB’s final conference, to be held in partnership with the US Naval Academy Museum, in Annapolis, Maryland, January 30-February 1, 2025. Among other topics are the racial, ethnic, and class relations in navies around the world between 1775 and 1914, and the impact naval life had on the working-class communities from which the sailors originated. We, therefore, invite all those interested in Civil War sailors, or any sailors around the world, in the long nineteenth century, to join us for that conference.
Our call for papers is here. For more information, please contact david.gleeson@northumbria.ac.uk or wwshieh@gmail.com
[i] Cameron Blevins and Christy Hyman, “Digital History and the Civil War Era,” Journal of the Civil War Era 12 (March 2022): 80-104., quote on page 97.
[ii] See, for example, Jonathan S. Jones, “Students Critique a ChatGPT Essay,” Perspectives (Sept. 2023) available at https://www.historians.org/perspectives-article/students-critique-a-chatgpt-essay-a-classroom-experiment-september-2023/ accessed July 25, 2024. Royal Historical Society, “Education Policy,” available at https://royalhistsoc.org/policy/education/, accessed Jul 25, 2024.
[iii] See, for example, the essays in R. Darrell Meadows and Joshua Sternfeld “Artificial Intelligence and the Practice of History: A Forum,” American Historical Review (Sept. 2023): 1345-1349.
[iv] On height, nutrition and health see Roderick Flud, Kenneth Wachter, Annabel Gregory, Height, Health and History: Nutritional Status in the United Kingdom, 1750–1980 (New York: Cambridge University Press, 1990).
[v] “What is Zooniverse?” https://www.zooniverse.org/about accessed, July 25, 2024/
[vi] For important of recording “Contraband” see, for example, See for example, E. K. Owen to [David D.] Porter, Jan. 4, 1864, David Dixon Porter Papers, Huntington Library, Pasadena, California; June 3, 1863 S F Dupont to W. E Le Roy, June 3, 1863, Record Group 45, Subject File US Navy, 1775-1910, Box 263, NARA
[vii] For more information on how the computer learns how to “read” this writing, see “Machine Learning and Your Transcriptions” on CWB’s YouTube channel here https://www.youtube.com/watch?v=1l6giQr5qTg&t .
[viii] See “Colored Convention Project Principles” at https://coloredconventions.org/about/principles/ accessed, July 25, 2024.
[ix]Hauke Riesch and Clive Potter, “Citizen science as seen by scientists: Methodological, epistemological and ethical dimensions,” Public Understanding of Science 23 (Jan 2014): 107-120; Julie McDonough, “The ethics of crowdsourcing,” Dolmaya, Linguistica Antverpiensia, New Series – Themes in Translation Studies 11 (2011) online at https://lans-tts.uantwerpen.be/index.php/LANS-TTS/article/view/279 accessed July 25, 2024; Vanessa Williamson, “On the Ethics of Crowdsourced Research,” Political Science and Politics 49 (Jan 2016): 77-81.
[x] “Zooniverse User Agreement and Privacy Policy,” https://www.zooniverse.org/privacy accessed, July 25, 2024.
[xi] You can read about this research into Frank Branch on our website here https://civilwarbluejackets.com/2023/09/20/bluejacket-community-discoveries-on-the-trail-of-an-african-american-child-in-the-union-navy/ and here https://civilwarbluejackets.com/2023/11/14/bluejacket-community-discoveries-an-update-on-the-search-for-frank-branch-african-american-child-in-the-u-s-navy/.
[xii] See “Category: Citizen Science Discoveries,” https://civilwarbluejackets.com/category/citizen-scientist-discoveries/ accessed July 25, 2024.
[xiii] Michael J. Bennett, Union Jacks: Yankee Sailors in the Civil War (Chapel Hill: University of North Carolina Press, 2004); Steven J. Ramold, Slaves, Soldiers, Citizens: African Americans in the Union Navy (De Kalb: Northern Illinois University Press, 2002); Dennis J. Ringle, Life in Mr. Lincoln’s Navy (Annapolis: Naval Institute Press, 1998); Joseph Reidy, “Black Men in Navy Blue During the Civil War,” Prologue 33 (Fall 2001), available at https://www.archives.gov/publications/prologue/2001/fall/black-sailors, accessed, July 30, 2024.
One Reply to “Civil War Bluejackets: Citizen Science, Machine Learning, and the US Navy Common Sailor”