Early Modern Data Curation Fellow
Carnegie Mellon University’s Department of English and University Libraries jointly seek an Early Modern Data Curation Fellow to lead data curation activities for the Six Degrees of Francis Bacon (SDFB) project, a digital reconstruction of the early modern social network that scholars and students can collaboratively expand, revise, curate, and critique. The fellow will leverage expertise in early modern studies along with technical aptitude in order to contribute meaningfully to a rich data lifecycle, including collecting, processing, textmining, analyzing, and archiving data related to the early modern social network.
The Carnegie Mellon Dept. of English is a growing hub for Early Modern Studies, supporting not only SDFB but also the Pittsburgh Consortium of Medieval and Renaissance Studies (PCMRS; http://www.medren.org/). PCMRS, which was founded by the Department of English at Carnegie Mellon, draws members from numerous disciplines (art history, English literature and drama, history, music, philosophy, religious studies, Romance and European Languages & Literatures) and a range of institutions (Carnegie Mellon, Duquesne, the University of Pittsburgh, West Virginia University, Chatham College and Slippery Rock). It coordinates activities among the different campuses in the region and sponsors an active speaker series.
Carnegie Mellon University Libraries has worked actively in partnership with the University’s top-ranked School of Computer Science (SCS) to achieve a digital future for libraries. The successes of CMU Libraries’ Universal Library and Million Book Projects, acknowledged inspirations for Google Books, are now being followed by work on equally challenging initiatives including the Olive project (https://olivearchive.org/) archiving “born digital” executable content such as digital humanities projects, software, scientific models, and games under the direction of PIs Mahadev Satyanarayanan and Gloriana St. Clair.
Six Degrees of Francis Bacon (SDFB; http://sixdegreesoffrancisbacon.com/) is a collaborative, multidisciplinary digital humanities project with wide utility for several subfields in early modern studies. Historians, literary critics, musicologists, art historians, and others have long studied the way that early modern people associated with each other and participated in various kinds of formal and informal groups. Yet their scholarship, published in countless books and articles, is scattered and unsynthesized. By data-mining existing scholarship and documents that describe relationships between early modern persons, SDFB has created a unified, systematized representation of the way people in early modern Britain were connected.
In SDFB’s start-up stage, which has been supported by two Google Faculty Research Awards, team members Christopher Warren, Daniel Shore, Cosma Shalizi, Michael Finegold, and Lawrence Wang have mined a single source, the Oxford Dictionary of National Biography (ODNB), to produce a preliminary data set of 10,000 individuals and to infer, with confidence estimates, a map of the associations between them. Initial results, available at http://sixdegreesoffrancisbacon.com/ and http://www.viewur.com/sixdegrees/ (BETA) already make it possible to visualize and understand the early modern social network in exciting new ways.
The Early Modern Data Curation fellow will bring his/her understanding of key research and pedagogical questions in Early Modern Studies to enhance the project’s impact on the field in four specific ways. The Fellow will be centrally involved in collaborative efforts (a.) documenting, archiving, and refining existing data sets and workflows; (b.) curating the dynamic crowdsourcing interface where users validate and annotate existing data; (c.) coordinating with major text repositories including Google Books, Hathi Trust Research Center, and the Institute for Historical Research, to develop workflows, corpora, and data sets; and (d.) communicating findings in multi-author and single-author publications.
The Fellow will be housed in the English Department in the Dietrich College of Arts and Social Sciences but work jointly across English and the University Libraries. Within the Department of English, the Fellow will work with Asst. Prof. Christopher Warren, Principal Investigator of the Six Degrees of Francis Bacon project. Within CMUL, s/he will be supervised by Gabrielle V. Michalek, Principal Archivist and Head of Archives and Digital Library Initiatives. Data description and metadata capture will be overseen by Gabrielle Michalek and Data Services Librarian Steve van Tuyl. Day-to-day work will involve a disciplinarily diverse and geographically disparate team of SDFB collaborators, including literary historians, librarians, statisticians, network scientists, and software developers.
Within CMU’s larger academic community, the CLIR early modern data curation Fellow will work alongside world leaders in digital strategies, data initiatives, innovations, and design. The Fellow will be encouraged to draw formally and informally from researchers and events across the CMU Campus, including those associated with CMU’s Language Technologies Institute, a leader in text mining and the initial of home of the reCAPTCHA system; the Human Computer Interaction Institute, a leader in research related to computer technology in support of human activity and society supporting labs like ProtoLab, which conducts research on social computing and design, and Social Computing, which conducts research in the design of online communities); and the School of Design, one of the most respected programs in the country, with strengths in communication and interaction design. The Fellow will have ample opportunities to participate in events sponsored by the Pittsburgh Consortium of Medieval and Renaissance Studies and the Medieval and Renaissance Studies Program at the University of Pittsburgh, a short five minute walk from the CMU campus.
The successful candidate will likely have a PhD in Early Modern English, History, or Library and Information Science with demonstrated research strengths in historicist approaches, digital humanities, book history, and/or early modern networks broadly conceived. The ideal candidate will have a strong technical aptitude and be willing to learn or apply skills in data identification, data preparation, data ingest, and metadata generation.
- Develop and implement workflows for Six Degrees of Francis Bacon datasets including data identification, data preparation, data ingest, and metadata generation.
- Curate dynamic crowdsourcing interface where users validate and annotate existing data.
- Coordinate with major text repositories including Google Books, Hathi Trust Research Center, and the Institute for Historical Research to develop research workflows, corpora, and data sets.
- Communicate methodologies and research findings in multi-author and single- author publications.
Required Knowledge and Skills
- A PhD in a relevant subfield of early modern studies or Library and Information
- Sciences with demonstrated expertise in early modern studies.
- Demonstrated ability to work collaboratively and successfully in a team-based environment.
- Demonstrated willingness to learn and implement key standards in data curation
- Active research in early modern studies, preferably animated by historicist methodologies, book history, and/or network studies.
- Excellent verbal and written communication skills.
Desired Knowledge and Skills
- Ability to blend early modern expertise with technical expertise in prevailing standards and best practices in the development of early modern data repositories.
- Familiarity with or demonstrated capacity to work within the HTRC research environment.
- Proficiency in a current programming language such as Python and/or background in any of the following: Stanford CoreNLP, Gephi, SEASR, XML, R, Neo4j, Dublin Core, RDF.