The Embryo Project

Methods

Cera Lawrence

The Embryo Project’s stated goal is to identify Change Agents in the history of embryology—“those factors and forces that have affected and changed the scientific study of embryos” (Maienchein and Laubichler 2006). In pursuit of that goal, the Embryo Project has undertaken to create an online repository and encyclopedia of information about embryology from Aristotle to the present day. As part of a massively collaborative effort, the Embryo Project employs undergraduate researchers as the primary content creators for the Encyclopedia. The Project uses software called Fedora (short for Flexible Extensible Digital Object Repository Architecture) to store and manage the content for the Embryo Project Encyclopedia.

Fedora, developed by researchers at Cornell University and the University of Virginia, is object-oriented, open-source software that was created with libraries, museums, and other educational institutions in mind. It was specifically designed to use the emerging technology of the Semantic Web, which allows objects stored in the repository to be linked to other objects with meaningful semantic relationships. These three design traits—object-oriented, open-source, and Semantic Web—are what make Fedora uniquely suited to accomplish the Embryo Project’s goals. In this thesis I describe in greater detail just what it means for a database architecture to be object-oriented, open-source, and Semantic Web-aware. Then, by relating one undergraduate researcher’s experience working in the Embryo Project, I demonstrate how the use of Fedora aids the creation of a high-quality, publicly available Encyclopedia, and how the use of semantic relationships in that Encyclopedia help achieve the Embryo Project’s goal.

Abstract of “Telling a Different Story: The History of Science as a Web of Relationships in the Embryo Project”, by Cera Lawrence. Download the full thesis. (PDF, 1.1 MB)

Technical Notes:

Articles submitted to the Embryo Project are first edited for style and content by the editorial team. Successful articles are then marked up in XHTML using the National Library of Medicine's journal article DTD, and metadata and semantic relationship information is added utilizing a vetted list of relationships between objects of interest. XHTML transformations create Dublin Core, article, RELS-EXT, and other datastreams in the Fedora repository, which can then be searched and displayed via a web browser. Adoption of the NLM DTD follows the lead of hundreds of journals who have selected this text-markup framework as a standard for both general markup and archiving of scholarly articles. For the Embryo Project, its adoption is part of a strategy to create a standards-based resources with long- term sustainability, and potential of exchangeability with a broad range of collaborators.

The Embryo Project includes selected references from the vast literature pertaining to embryology and developmental biology, as well as descriptions of primary source materials, such as archival collections of documents and records. Wherever possible, citations have been linked to external web sites that assist with the location of these resources: for books, links are made when possible to OCLC's Open WorldCat, and for articles, links are created that link to OCLC's global registry of citation link resolvers.

Citations for publications, archival resources, etc. are represented and stored in the Embryo Project Digital Repository using the Metadata Object Description Schema (MODS), a framework for descriptive metdata for bibliographic entities developed by the Library of Congress; when the full text of an archival finding aid is available, that information is represented and stored using the Encoded Archival Description framework (EAD), which is also currently maintained by the Library of Congress. Use of these standards assures the long-term sustainability and exchangeability of information created and compiled by the Embryo Project. Whenever possible, links are provided to freely available copies of full text of cited literature, or links are made to the OCLC Open WorldCat database to facilitate finding copies in libraries worldwide.

Digital images made available through the Embryo Project Encyclopedia have been scanned at high-resolution using the JPEG2000 image format. They are presented as JPEG thumbnail images in search results and descriptive views of images; high resolution views of images are made available in JPEG 2000 and TIFF formats via the open-source djatoka image server.