Challenge
Linked Open Data is the entry point to Big Data. As taxonomist with a strong believe that we have to bring our data, as opposed that the public will look for it, I am determined to make this happen.
The process
The following blog describes and quantifies the workflow from the discovery of a name to make the referenced taxonomic name available as LOD we develop and implemented at
Plazi.
1. Starting point. For my lectures on chemical communication in ants I stumbled upon
this note in sci-news describing a novel form of social parasitism based on the discovery of Cephalotes specularis described in 2014 by Brandão et al in Zootaxa, for which a DOI is provided (DOI:
zootaxa.3796.3.9). Unfortunately, this is a closed access articles. A search provided a link to it via
antcat to
antwiki, is available. To start processing the document into LOD, I added the article to the Biodiversity Literature Repository as
closed access article. This includes shows the original Zootaxa minted DOI, as well as an
alternative identifier from Zoobank.
2. Adding the name to the Hymenoptera Name Server: our reference system for ant names (
HNS) showed that the bibliographic reference of the article has already been added (
reference), but checking the name did not result in an entry. So I added the name to HNS through the
online form.
Here it is in HNS, and
here in HOL, to which we will create a link to from the treatment that we are going to produce now. We need this name server that is in our control, since Zootaxa does not add new names to Zoobank anymore.
3. To convert the above article into a semantically enhanced document I use our
Imagine software. It is already installed on my machine.A
manual is available. I added the metadata of the publication, parsed all the taxonomic names, bibliographic records (that are included in
Refbank upon saving), treatment and structure, and materials citation and linked the name to HNS. The result is here as
html or
RDF, which has been the goal of the the exercise.
Time
Time from reading a name to finding the referrenced article 5min
1. Time for upload the article to BLR 5min
2. Time to add name the HNS 2 min
3. Time to convert pdf into semantically enhanced doc,
uploaded to Plazi 21min
Caveat
This only works within this time framework with training and both understanding the semantic structure of a taxonomic work, its model and the tagging tools.
It also depends on having access to all the resources, including Plazi (that can be
obtained).
Questions
Why to make all this effort?
The advantages are easily obvious:
1. 90% of the Name usages are like this:
Anochetus grandidieri Forel, 1891, or Fisher & Smith, 2008 cite
Anochetus grandidieri Forel. Neither of the referenced publications nor the treatments are linked. The most obvious case is in the
Catalogue of Life, where no linking is provided.
2. Much better is already a provision (via a link, or directly) of the proper bibliographic reference for Forel, 1891 or Fisher & Smith, 2008.
3. Even better is a link to a catalogue, such as
Anochetus grandidieri Forel 1891 in
the Hymenoptera Name Server or more explicitely using a persistent URI:
http://bioguid.osu.edu/xbiod_concepts/187786 that allows to cite this name properly. Ultimately, we need a universal system.
4. The next improvement is to get from the name a direct link to the respective article. Ideally the article is an archive that provides a DOI, eg
10.5281/zenodo.9896, such as the
Biodiversity Literature Repository can provide for legacy literature.
5. The next is to get a direct link to a digital version of the respective cited page:
Anochetus grandidieri Forel, 1891: 108, allowing directly to understand what the respective author had in mind when he created the concept (The principle of reproducibility in science).
6. The next is to get a direct link to the
treatment. Ideally, the treatment has a persistent identifier, such as this httpURI
http://treatment.plazi.org/id/1C4EDC17-8AD7-9DD7-F1A5-AB856E8C5BCA.
7. The next is to link cited treatments to the respective treatment, such as Fisher & Smith, 2008 usage of
Anochetus grandidieri
8. The final step is to create
5* data by providing all this content in an open, machine readable, semantically enhanced version:
Anochetus grandidieri Forel 1891 or
Anochetus grandidieri Forel sensu Fisher & Smith, 2008.
Why not rely on existing resources, such as antwiki, antcat?
How long does it take that this new treatment is propagated on the Web, especially by those which harvest data from Plazi:
HOL
Antweb
GBIF
EOL
Starting time is 20150429:11:57